爬取WHO各國病例數據

還在爲拿不到官方病例數據而發愁嗎?

WHO各國病例數據如下:
https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125cd

我們的目的就是爬出這個圖中的數據:
在這裏插入圖片描述

審查元素

首先我們隨便點開一個國家的疫情情況:

在這裏插入圖片描述

這裏以中國爲例,點開後找到URL:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true

Preview中可以看到:

在這裏插入圖片描述

就是我們想要的數據,但是他的時間格式我們沒有見過,兩兩差分可以發現規律:

兩個時期間相差864

上面是確證病例的URL,新增病例的如下:
https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27CHINA%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2CNewCase%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true

以幾個國家爲例,代碼如下(這裏暫時寫了名字是的單個單詞的國家):

#coding:utf-8
import urllib.request
import os
import pandas as pd
import json

res = pd.DataFrame()
def Open(url):
    heads = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
    req = urllib.request.Request(url, headers=heads)
    response = urllib.request.urlopen(url)
    html = response.read()
    return html.decode('utf-8')

def conserve(html, name):
    global res
    time, confirm = [], []
    temp = pd.DataFrame(columns=['time', name])
    for i in html['features']:
        time.append(i['attributes']['DateOfDataEntry'])
        confirm.append(i['attributes']['cum_conf'])
    temp['time'] = time
    temp[name] = confirm
    temp = temp.set_index('time')
    res = pd.concat([res, temp], axis=1)


def main():
    global res
    for name in ['China', 'Italy', 'Spain', 'France', 'Germany', 'Switzerland', 'Netherlands', 'Norway', 'Belgium', 'Sweden', 'Australia', 'Brazil', 'Egypt']:
        print(name)
        url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27' + name + '%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
        html = json.loads(Open(utl))
        conserve(html, name)
        print('--------------------------------------------------------------------------')

    #America 單獨拿出來
    name = 'America'
    url = 'https://services.arcgis.com/5T5nSi527N4F7luB/arcgis/rest/services/Historic_adm0_v3/FeatureServer/0/query?f=json&where=ADM0_NAME%3D%27United%20States%20of%20America%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=OBJECTID%2Ccum_conf%2CDateOfDataEntry&orderByFields=DateOfDataEntry%20asc&resultOffset=0&resultRecordCount=2000&cacheHint=true'
    html = json.loads(Open(url))
    conserve(html, name)


    res['Datetime'] = pd.date_range(start='20200122', end='20200316')
    res.to_csv('conform.csv', encoding='utf_8_sig')
main()

經過簡單的數據處理後的結果如下:

在這裏插入圖片描述

注意,如果res[‘Datetime’] = pd.date_range(start=‘20200122’, end=‘20200317’)這一行報錯,原因是我在三月十七號寫的,需要將20200317改成今天的日期

更新數據:
https://dashboards-dev.sprinklr.com/data/9043/global-covid19-who-gis.json

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章