The beginning of the year of rat is quite difficult for most Chinese people since the coronavirus outbreak. I have been checking the news about the progress of this outbreak. One day I watched the interview of a Chinese health official. She mentioned that there was one patient who has no idea that he has met anyone from Wuhan,the center of the outbreak. However, they used big data and found 3 people from Wuhan that have been in contact with him. Aha-ha! The DATA! I was excited, sit down and built this API so people can get the 2019nCov related data.

This API is free to use for research, study and unbiased and factual reporting. Otherwise the karma will find you 🙈 🙉 🙊.

The Data Sources [数据来源]:

This is a verified tool open for people in China to check whether or not they have been on the same train/bus/flight with people who have been confirmed with the infection.

This is a very popular and trustworthy platform for tracking the real-time status of total confirmed cases over the world.

The AWS Setup

I built a small and light ETL lambda function to parse all the data and write them to the dynamodb tables. They are scheduled by AWS cloudwatch events. It will run every day at 3:30 am. And there is one aws api gateway with 2 paths invoke 2 lambda functions to retrieve the data from the tables. I will make another post about how to terraform them up in 5 mins. Yep! It only took 5 mins to spin up but it will take a much longer time to understand.

Okie Dokie, now let’s look at the API.

新冠疫情爬虫数据接口:

The API Endpoint: https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/

path: /dxy

You can use this path to retrieve the stats of coronavirus including id, date, country, provinceName, cityName, confirmedCount, suspectedCount, curedCount, deadCount. Please be aware this ETL lambda function is scheduled from 2020-02-08. The latest data you can retrieve is from 2020-02-08.

Required URL parameters are date OR country OR provincName OR cityName OR all.

path /dxy
url parameter description returned result example
date Retrieve the coronavirus stats of a specific date. Accepted format is %Y-%m-%d (i.e. 2020-02-08). all fields from that date /dxy?date=2020-02-08
country Retrieve the coronavirus stats of a specific country. Accepted format is urllib encoded strings. all fields from that country /dxy?country=''
provincName Retrieve the coronavirus stats of a specific provincName. Accepted format is urllib encoded strings. all fields from that provincName /dxy?provincName=''
cityName Retrieve the coronavirus stats of a specific country. Accepted format is urllib encoded strings. all fields from that cityName /dxy?cityName=''
all Set as ‘yes’ to retrieve all data all fields including id, date, country, provinceName, cityName, confirmedCount, suspectedCount, curedCount, deadCount /dxy?all='yes'

examples

  • search by the date:
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/dxy?date=2020-02-08 | jq .[0]
{
  "id": "228",
  "date": "2020-02-08",
  "country": "中国",
  "provinceName": "北京市",
  "cityName": "通州区",
  "confirmedCount": "15",
  "suspectedCount": "NULL",
  "curedCount": "NULL",
  "deadCount": "NULL",
  "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
}
  • search by cityName
# you will need to do url encode
# i.e. 武汉(wuhan) will be something like %E6%AD%A6%E6%B1%89 
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/dxy?cityName=%E6%AD%A6%E6%B1%89 | jq .
[
  {
    "id": "24",
    "date": "2020-02-08",
    "country": "中国",
    "provinceName": "湖北省",
    "cityName": "武汉",
    "confirmedCount": "13603",
    "suspectedCount": "NULL",
    "curedCount": "698",
    "deadCount": "545",
    "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
  }
]
  • to retrieve all data
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/dxy?all=yes | jq .[0]
{
  "id": "228",
  "date": "2020-02-08",
  "country": "中国",
  "provinceName": "北京市",
  "cityName": "通州区",
  "confirmedCount": "15",
  "suspectedCount": "NULL",
  "curedCount": "NULL",
  "deadCount": "NULL",
  "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
}

path: /travel

This path is to retrieve the confirmed cases’ travel paths including id, date, start, stop, t_type, t_no, t_no_sub.

Required URL parameters are date OR start OR stop OR all

Path /travel
url parameter description returned result example
date Retrieve the coronavirus cases’ travel paths of a specific date. Accepted format is %Y-%m-%d (i.e. 2020-02-08). all fields from that date /travel?date=2020-02-08
start Retrieve the coronavirus cases’ travel paths of a specific starting point. Accepted format is urllib encoded strings. all fields from that starting point /travel?start=''
end Retrieve the coronavirus cases’ travel paths of a specific ending point. Accepted format is urllib encoded strings. all fields from that end point /travel?end=''
all Set as ‘yes’ to retrieve all data all fields /travel?all='yes'

examples

  • search by date
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/travel?date=2020-02-02 | jq .
[
  {
    "id": "1246",
    "date": "2020-02-02",
    "start": "南极国际小区",
    "stop": "哈尔滨传染病院",
    "type": "6",
    "t_no": "黑AE888Z",
    "t_no_sub": "网约车",
    "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
  }
]
  • search by start city
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/travel?start=%E6%AD%A6%E6%B1%89 | jq .[0]
{
  "id": "475",
  "date": "2020-01-19",
  "start": "武汉",
  "stop": "成都东",
  "type": "2",
  "t_no": "D366",
  "t_no_sub": "03号车厢",
  "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
}
  • to retrieve all data
$ curl -s https://4mmhkv7z9e.execute-api.eu-west-1.amazonaws.com/v1/travel?all=yes | jq .[0]
{
  "id": "228",
  "date": "2020-01-17",
  "start": "海口东",
  "stop": "棋子湾",
  "type": "2",
  "t_no": "C7402",
  "t_no_sub": "NULL",
  "msg": " No Man is an Island  🏝  没有人是一座孤岛 @pingzhou| 平舟 ⛵"
}

After retrieve the path data about date, start location and end location, you can call the Google MAP API to add the geometric data in and make a video like this:

👉 Click here you will find the people’s travel path map

👉 Click here you will find the heatmap

Please feel free to leave some comments if you have any questions or have any other interesting use cases. If you have enjoyed reading this post, please feel free to buy me a (r'virtual|physical', 'coffee'). All the coffee from #2019nCov posts will be donated (if there are any).

Buy me a coffeeBuy me a coffee