這段時間再玩python ,數據源來源於mongdb ,數據處理方式用的是pandas
剛開始是用的一個比較麻煩的轉化,直接上代碼:
方法一:
import pandas as pd
from pymongo import MongoClient
client = MongoClient('192.168.1.5',10070)
db = client.dbtest
collection=db.data_table
items = collection.find()
dateId = []
ai_type = []
ai_name = []
quorum = []
priceUSD = []
ai_disageform = []
country = []
continent = []
company = []
ai_cap_tr = []
n = 0
for i in items:
n= n+1
print("正在輸出 %s 條"%n)
keys = i.keys()
if 'ai_disageform' in keys:
ai_disageform.append(i['ai_disageform'])
else:
ai_disageform.append('')
if 'date' in keys:
t = str(i['date'])
dateId.append(t[:10])
else:
dateId.append('')
if 'ai_type' in keys:
ai_type.append(i['ai_type'])
else:
ai_type.append('')
if 'continent' in keys:
continent.append(i['continent'])
else:
continent.append('')
if 'quorum' in keys:
quorum.append(i['quorum'])
else:
quorum.append('')
if 'priceUSD' in keys:
priceUSD.append(i['priceUSD'])
else:
priceUSD.append('')
if 'country' in keys:
country.append(i['country'])
else:
country.append('')
if 'ai_name' in keys:
ai_name.append(i['ai_name'])
else:
ai_name.append('')
if 'company' in keys:
company.append(i['company'])
else:
company.append('')
if 'ai_cap_tr' in keys:
ai_cap_tr.append(i['ai_cap_tr'])
else:
ai_cap_tr.append('')
df = pd.DataFrame({'dateId':dateId,
'ai_type':ai_type,
'ai_name':ai_name,
'quorum':quorum,
'priceUSD':priceUSD,
'ai_disageform':ai_disageform,
'country':country,
'continent':continent,
'ai_cap_tr':ai_cap_tr,
'company':company})
df.to_csv('../ncbdata/b.csv', encoding = "utf-8",index=None)
具體思路:經測驗,每條記錄是dict類型的,將每個鍵裏的值放到不同的數組中,然後創建dataframe對象。
方法二:
import pandas as pd
import numpy as np
import pymongo
from pymongo import MongoClient
import json
#連接mongdb
def connectMongdb():
client = MongoClient('192.168.1.5',10070)
db = client.dbtest
collection = db.data_table
items = collection.find()
return items
#轉化爲df
def tran_df():
items = connectMongdb()
temp = []
for dict in items:
del dict['_id']
dict['date'] = dict['date'].strftime("%Y-%m-%d")
temp.append(dict)
data_employee = pd.read_json(json.dumps(temp))
data_employee_ri = data_employee.reindex(columns=['date', 'ai_type', 'ai_name'])
data_employee_ri.to_csv('data/a.csv')
def main():
tran_df()
if __name__ == "__main__":
main()
具體思路:將每一個字典放到一個數組裏,然後通過read_json() 方法轉化爲df對象。