還在爲拿不到官方病例數據而發愁嗎?
WHO各國病例數據如下:
https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125cd
我們的目的就是爬出這個圖中的數據:
審查元素
首先我們隨便點開一個國家的疫情情況:
這裏以中國爲例,點開後找到URL:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true
Preview中可以看到:
就是我們想要的數據,但是他的時間格式我們沒有見過,兩兩差分可以發現規律:
兩個時期間相差864
上面是確證病例的URL,新增病例的如下:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2CNewCase%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true
以幾個國家爲例,代碼如下(這裏暫時寫了名字是的單個單詞的國家):
#coding:utf-8
import urllib.request
import os
import pandas as pd
import json
res = pd.DataFrame()
def Open(url):
heads = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
req = urllib.request.Request(url, headers=heads)
response = urllib.request.urlopen(url)
html = response.read()
return html.decode('utf-8')
def conserve(html, name):
global res
time, confirm = [], []
temp = pd.DataFrame(columns=['time', name])
for i in html['features']:
time.append(i['attributes']['DateOfDataEntry'])
confirm.append(i['attributes']['cum_conf'])
temp['time'] = time
temp[name] = confirm
temp = temp.set_index('time')
res = pd.concat([res, temp], axis=1)
def main():
global res
for name in ['China', 'Italy', 'Spain', 'France', 'Germany', 'Switzerland', 'Netherlands', 'Norway', 'Belgium', 'Sweden', 'Australia', 'Brazil', 'Egypt']:
print(name)
url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27' + name + '%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
html = json.loads(Open(utl))
conserve(html, name)
print('--------------------------------------------------------------------------')
#America 單獨拿出來
name = 'America'
url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27United%20States%20of%20America%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
html = json.loads(Open(url))
conserve(html, name)
res['Datetime'] = pd.date_range(start='20200122', end='20200316')
res.to_csv('conform.csv', encoding='utf_8_sig')
main()
經過簡單的數據處理後的結果如下:
注意,如果res[‘Datetime’] = pd.date_range(start=‘20200122’, end=‘20200317’)這一行報錯,原因是我在三月十七號寫的,需要將20200317改成今天的日期
更新數據:
https://dashboards-dev.sprinklr.com/data/9043/global-covid19-who-gis.json