關於2019nCoV新冠肺炎的建模(Ⅰ)—數據抓取與熱地圖的繪製

引言

進來由於疫情勢態嚴峻,筆者一直宅在家裏做死宅,無聊逛逛csdn,發現上面有大佬在對武漢肺炎的數據進行抓取和地圖繪製,筆者手癢,便依葫蘆畫瓢,嘗試看看能不能從中獲取一些實用的數據用於建模研究。遂成此文,不足之處望多加指正。

數據抓取

數據源來自騰訊的疫情動態更新
對於爬蟲筆者也是一個菜鳥而已,具體過程不做過多的文字贅述,以免貽笑大方,筆者直接用代碼說話。

前期準備

對於程序的實現,除了需要調用機器學習的常用的幾個第三方庫外,還用需要一些文件讀取庫、和用於疫情地圖生成的第三庫。如:

 地圖繪製庫:pyecharts
(地圖數據庫):
echarts-china-cities-pypkg    0.0.9
echarts-china-counties-pypkg  0.0.2
echarts-china-misc-pypkg      0.0.1
echarts-china-provinces-pypkg 0.0.3
echarts-countries-pypkg
文件讀取庫:
requests
xlrd
zipp

    

在裝庫時,筆者這裏遇到網速太慢而文件稍大無法安裝的情況,筆者選擇的,筆者採用的解決方案爲利用國內的第三方鏡像網站如豆瓣、清華大學的鏡像網站進行第三庫的下載。
如安裝numpypip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com numpy
在這裏插入圖片描述

抓取數據預覽

我們先看看簡單的將數據抓取下來,所得數據的數據結構的情況
這裏的運行環境爲IDLE,後面的都爲pycharm

>>>import time, json, requests
>>> url = 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_area_counts&callback=&_=%d'%int(time.time()*1000)
>>>
>>> data = json.loads(requests.get(url=url).json()['data'])
>>> print(type(data))
<class 'dict'>


抓取數據與簡單處理

引用約定

import pandas as pd
from pyecharts.charts import Map
from pyecharts import options as opts
import time, json, requests
from pyecharts.globals import ThemeType

定義的數據抓取與處理函數

#定義數據抓取函數
def catch_data():
    url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5'
    reponse = requests.get(url=url).json()
    #返回數據字典
    data = json.loads(reponse['data'])
    return data

data = catch_data()
print(data.keys())
#來自騰訊的數據,其數據來自衛健委,問題1:數據可否再細化?(目前數據特徵已經足夠)
# 數據集包括["國內總量","國內新增","更新時間","數據明細","每日數據","每日新增"]
lastUpdateTime = data['lastUpdateTime']
chinaTotal = data['chinaTotal']
chinaAdd = data['chinaAdd']
print(chinaTotal)
print(chinaAdd)

# 定義數據處理函數
def confirm(x):
    confirm = eval(str(x))['confirm']
    return confirm
def suspect(x):
    suspect = eval(str(x))['suspect']
    return suspect
def dead(x):
    dead = eval(str(x))['dead']
    return dead
def heal(x):
    heal =  eval(str(x))['heal']
    return heal

在這裏插入圖片描述

數據製表與地圖繪製

想要繪製出疫情地圖和繼續後期建模研究,只抓取到數據還不夠,還需要對數據進行切割和拼接,以便用於後期研究。

製作國際數據集

#生成國際數據集文件問題2:哪來的中英各國對照表?(自制)
global_data = pd.DataFrame(data['areaTree'])
global_data['confirm'] = global_data['total'].map(confirm)
global_data['suspect'] = global_data['total'].map(suspect)
global_data['dead'] = global_data['total'].map(dead)
global_data['heal'] = global_data['total'].map(heal)
global_data['addconfirm'] = global_data['today'].map(confirm)
global_data['addsuspect'] = global_data['today'].map(suspect)
global_data['adddead'] = global_data['today'].map(dead)
global_data['addheal'] = global_data['today'].map(heal)
world_name = pd.read_excel("世界各國中英文對照.xlsx")
global_data = pd.merge(global_data,world_name,left_on ="name",right_on = "中文",how="inner")
global_data = global_data[["name_y","中文","confirm","suspect","dead","heal","addconfirm","addsuspect","adddead","addheal"]]
#print(global_data.head())
global_data=pd.DataFrame(global_data)
#print(global_data.head())
#生成csv數據集
global_data.to_csv("global_data_2020_02_03.csv")

利用國際數據集(global_data)繪製全球熱圖

#利用goalal_data繪製世界區域疫情圖
world_map = Map(init_opts=opts.InitOpts(theme=ThemeType.WESTEROS))
world_map.add("",[list(z) for z in zip(list(global_data["name_y"]), list(global_data["confirm"]))], "world",is_map_symbol_show=False)
world_map.set_global_opts(title_opts=opts.TitleOpts(title="2019_nCoV-世界疫情地圖"),
                          visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                          pieces = [
                        {"min": 101 , "label": '>100'}, #不指定 max,表示 max 爲無限大
                        {"min": 10, "max": 100, "label": '10-100'},
                        {"min": 0, "max": 9, "label": '0-9' }]))
world_map.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
world_map.render('20200203世界疫情地圖.html')

程序會生成一個.html格式的文件,可以在網頁中打開。
在這裏插入圖片描述

製作國內數據集

#國內數據集製作
#進行第一步處理
# 數據明細,數據結構比較複雜,一步一步打印出來看,先明白數據結構
areaTree = data['areaTree']
china_data = areaTree[0]['children']
china_list = []
for a in range(len(china_data)):
    province = china_data[a]['name']
    province_list = china_data[a]['children']
    for b in range(len(province_list)):
        city = province_list[b]['name']
        total = province_list[b]['total']
        today = province_list[b]['today']
        china_dict = {}
        china_dict['province'] = province
        china_dict['city'] = city
        china_dict['total'] = total
        china_dict['today'] = today
        china_list.append(china_dict)

china_data = pd.DataFrame(china_list)
print(china_data.head())

# 函數映射
china_data['confirm'] = china_data['total'].map(confirm)
china_data['suspect'] = china_data['total'].map(suspect)
china_data['dead'] = china_data['total'].map(dead)
china_data['heal'] = china_data['total'].map(heal)
china_data['addconfirm'] = china_data['today'].map(confirm)
china_data['addsuspect'] = china_data['today'].map(suspect)
china_data['adddead'] = china_data['today'].map(dead)
china_data['addheal'] = china_data['today'].map(heal)
china_data = china_data[["province","city","confirm","suspect","dead","heal","addconfirm","addsuspect","adddead","addheal"]]
print(china_data.head())
#以csv文件形式存儲
china_data.to_csv("china_data_2020_02_03.csv",encoding='utf_8_sig')

利用china_data繪製熱圖

#利用china_data繪製中國疫情圖
area_data = china_data.groupby("province")["confirm"].sum().reset_index()
area_data.columns = ["province","confirm"]
area_map = Map(init_opts=opts.InitOpts(theme=ThemeType.WESTEROS))
area_map.add("",[list(z) for z in zip(list(area_data["province"]), list(area_data["confirm"]))], "china",is_map_symbol_show=False)
area_map.set_global_opts(title_opts=opts.TitleOpts(title="2019_nCoV中國疫情地圖"),visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                pieces = [
                        {"min": 1001 , "label": '>1000',"color": "#893448"}, #不指定 max,表示 max 爲無限大
                        {"min": 500, "max": 1000, "label": '500-1000',"color": "#ff585e"},
                        {"min": 101, "max": 499, "label": '101-499',"color": "#fb8146"},
                        {"min": 10, "max": 100, "label": '10-100',"color": "#ffb248"},
                        {"min": 0, "max": 9, "label": '0-9',"color" : "#fff2d1" }]))
area_map.render('20200203中國疫情地圖.html')

實現效果:
在這裏插入圖片描述

用於後續建模的數據集的製作

#日數據集文件
chinaDayList = pd.DataFrame(data['chinaDayList'])
chinaDayList = chinaDayList[['date','confirm','suspect','dead','heal']]
chinaDayList.head()
chinaDayList.to_csv("china_DailyList_2020_02_03.csv",encoding='utf_8_sig')

#日增加數據集
chinaDayAddList = pd.DataFrame(data['chinaDayAddList'])
chinaDayAddList = chinaDayAddList[['date','confirm','suspect','dead','heal']]
chinaDayAddList.to_csv("chinaDayAddList.csv",encoding='utf_8_sig')

手癢細化的熱圖

筆者利用手中數據集與幾個宜春地區和杭州地區的官方公衆號公衆號,將疫情地圖細化,便於筆者對於關心的地區的疫情情況瞭解:

#利用公衆號的發佈信息製作進一步的細化的疫情地圖
# 江西省疫情地圖
province_distribution = {'南昌市':103, '九江市':64, '新餘市': 50,
                         '贛州市':35,  '宜春市':36, '撫州市': 29,
                         '上饒市':32,  '萍鄉市':18, '吉安市': 13,
                         '鷹潭市':8,  '景德鎮市':3
                         }

map = Map()
map.set_global_opts(
    title_opts=opts.TitleOpts(title="20200202江西省疫情地圖"),
    visualmap_opts=opts.VisualMapOpts(max_=391, is_piecewise=True,
                                      pieces=[
                                        {"max": 120, "min": 95, "label": ">95", "color": "#8A0808"},
                                        {"max": 94, "min": 70, "label": "94-70", "color": "#B40404"},
                                        {"max": 69, "min": 45, "label": "45-69", "color": "#DF0101"},
                                        {"max": 44, "min": 20, "label": "20-44", "color": "#F78181"},
                                        {"max": 19, "min": 1, "label": "1-19", "color": "#F5A9A9"},
                                        {"max": 0, "min": 0, "label": "0", "color": "#FFFFFF"},
                                        ], )  #分段
    )
map.add("20200202江西省疫情地圖", data_pair=province_distribution.items(), maptype="江西", is_roam=True)
map.render('20200202江西省疫情地圖.html')

#市級防疫地圖
city_distribution = {'靖安縣':0, '奉新縣':0, '銅鼓縣':2 ,
                         '宜豐縣':3,  '高安市':2, '萬載縣': 0,
                         '上高縣':0,  '豐城市':25, '樟樹市': 1,
                         '袁州區':3
                         }

# maptype='china' 只顯示全國直轄市和省級
map = Map()
map.set_global_opts(
    title_opts=opts.TitleOpts(title="20200202宜春市疫情地圖"),
    visualmap_opts=opts.VisualMapOpts(max_=34, is_piecewise=True,
                                      pieces=[
                                        {"max": 31, "min": 20, "label": ">20", "color": "#8A0808"},
                                        {"max": 19, "min": 10, "label": "10-19", "color": "#B40404"},
                                        {"max": 9, "min": 5, "label": "5-9", "color": "#DF0101"},
                                        {"max": 4, "min": 2, "label": "2-4", "color": "#F78181"},
                                        {"max": 1, "min": 1, "label": "1", "color": "#F5A9A9"},
                                        {"max": 0, "min": 0, "label": "0", "color": "#FFFFFF"},
                                        ], )  #最大數據範圍,分段
    )
map.add("20200202宜春市疫情地圖", data_pair=city_distribution.items(), maptype="宜春", is_roam=True)
map.render('20200202宜春市疫情地圖.html')

#杭州市防疫地圖
city_distribution = {'餘杭區':28, '蕭山區':16, '桐廬縣':14 ,
                         '西湖區':12,  '拱墅區':10, '江乾區': 9,
                         '上城區':8,  '下城區':5, '臨安市': 5,
                         '濱江區':4,'富陽區':4,'錢塘新區':2,
                         '建德市':1
                         }

# maptype='china' 只顯示全國直轄市和省級
map = Map()
map.set_global_opts(
    title_opts=opts.TitleOpts(title="20200202杭州市疫情地圖"),
    visualmap_opts=opts.VisualMapOpts(max_=118, is_piecewise=True,
                                      pieces=[
                                        {"max": 35, "min": 25, "label": ">25", "color": "#8A0808"},
                                        {"max": 24, "min": 15, "label": "24-15", "color": "#B40404"},
                                        {"max": 14, "min": 10, "label": "14-10", "color": "#DF0101"},
                                        {"max": 9, "min": 5, "label": "9-5", "color": "#F78181"},
                                        {"max":4 , "min": 1, "label": "1-4", "color": "#F5A9A9"},
                                        {"max": 0, "min": 0, "label": "0", "color": "#FFFFFF"},
                                        ], )  #最大數據範圍,分段
    )
map.add("20200202杭州市疫情地圖", data_pair=city_distribution.items(), maptype="杭州", is_roam=True)
map.render('20200202杭州市疫情地圖.html')

實現的疫情地圖效果:
1.江西省:
在這裏插入圖片描述
2.宜春市:
在這裏插入圖片描述筆者所在的豐城市已然成了宜春地區的“小武漢”。看來這幾天還是在家宅着比較安全。

3.杭州市:
在這裏插入圖片描述
繪製杭州市疫情地圖時,python導入的地圖數據庫與杭州官方公衆號發佈的地區在對應有誤。
故無法將淳安縣的數據進行在地圖上的可視化。

成果彙總

在這裏插入圖片描述

總結:

筆者實現工具爲pycharm,但是筆者發現,有好幾個大佬的數據可視化工具爲jupyter notebook,經這次的實現,方覺,在數據的可視化實現效果上,筆者個人感覺pycharm要稍遜於jupyter notebook,但在編程的便捷性上,pycharm是要更加舒適的,故最終選用了pycham作爲數據分析的實現工作。以上爲個人感受,具體選用,依讀者的個人喜好。
最後希望疫情可以早日結束,中國加油!武漢加油!

參考鏈接

https://blog.csdn.net/weixin_43130164/article/details/104113559
https://blog.csdn.net/xufive/article/details/104093197

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章