兩個DataFrame方法官方說明奉上,
df0 = pd.read_csv(file_name)
df0 = df0.append(df,ignore_index=True)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)
df0.to_csv(file_name, index=False)
df0 = pd.read_csv(file_name)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)
trade_date ggt_ss ggt_sz hgt sgt north_money south_money
0 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
1 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
2 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
3 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
4 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
10 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
11 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
12 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
13 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
14 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
25 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
26 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
27 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
28 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
29 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
30 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
31 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
32 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
33 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
34 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
trade_date ggt_ss ggt_sz hgt sgt north_money south_money
0 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
1 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
2 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
3 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
4 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
5 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
6 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
7 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
8 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
9 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
所讀的數據就是df.to_csv寫的,數據格式樣的,但是不行,最後嘗試將append後的數據先to_csv再read_csv,然後再drop_duplicates,驚人發現此時可以去重,不知何故,本質一定是pandas認爲這裏不存在重複被,但是爲什麼呢
2020年2月10日發現了問題所在,自問自答一下:
原來是數據類型不一樣,默認pd.read_csv會把某些數據轉換成int,float等類型,而新加的數據我都是str的,所以去重去不了,只要df1 = df1.astype(type("1"))都轉換成str再去重就OK了參考更改dataFrame數據類型https://blog.csdn.net/python_ai_road/article/details/81158376
當然也可以讀取csv的時候指定數據類型參考pandas官方說明文檔。https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv
數據列太多不便指定就一律轉換成str吧