pandas 使用DataFrame.append方法得到的數據使用DataFrame.drop_duplicates方法去重失敗，問題在數據類型不一致

原創

2020-06-23 08:30

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html#pandas.DataFrame.append

兩個DataFrame方法官方說明奉上，

df0 = pd.read_csv(file_name)
df0 = df0.append(df,ignore_index=True)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)
df0.to_csv(file_name, index=False)
df0 = pd.read_csv(file_name)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)

 trade_date  ggt_ss  ggt_sz      hgt      sgt  north_money  south_money
0    20190829  1960.0   418.0   567.11   199.79       766.90       2378.0
1    20190828  1687.0   493.0  -112.55  -303.25      -415.80       2180.0
2    20190827  2621.0   762.0  7371.21  4394.13     11765.34       3383.0
3    20190826  4005.0  1599.0 -2107.11  -530.21     -2637.32       5604.0
4    20190823  2041.0  1013.0    91.95  1446.33      1538.28       3054.0
10   20190822  2384.0   554.0   643.87  1268.83      1912.70       2938.0
11   20190821  2089.0   927.0  1432.15   891.26      2323.41       3016.0
12   20190820  1978.0  1007.0  -367.52  -471.31      -838.83       2985.0
13   20190819  2075.0  1395.0  3861.04  4621.52      8482.56       3470.0
14   20190816  3811.0  1726.0  -102.61   253.84       151.23       5537.0
25   20190829  1960.0   418.0   567.11   199.79       766.90       2378.0
26   20190828  1687.0   493.0  -112.55  -303.25      -415.80       2180.0
27   20190827  2621.0   762.0  7371.21  4394.13     11765.34       3383.0
28   20190826  4005.0  1599.0 -2107.11  -530.21     -2637.32       5604.0
29   20190823  2041.0  1013.0    91.95  1446.33      1538.28       3054.0
30   20190822  2384.0   554.0   643.87  1268.83      1912.70       2938.0
31   20190821  2089.0   927.0  1432.15   891.26      2323.41       3016.0
32   20190820  1978.0  1007.0  -367.52  -471.31      -838.83       2985.0
33   20190819  2075.0  1395.0  3861.04  4621.52      8482.56       3470.0
34   20190816  3811.0  1726.0  -102.61   253.84       151.23       5537.0
   trade_date  ggt_ss  ggt_sz      hgt      sgt  north_money  south_money
0    20190829  1960.0   418.0   567.11   199.79       766.90       2378.0
1    20190828  1687.0   493.0  -112.55  -303.25      -415.80       2180.0
2    20190827  2621.0   762.0  7371.21  4394.13     11765.34       3383.0
3    20190826  4005.0  1599.0 -2107.11  -530.21     -2637.32       5604.0
4    20190823  2041.0  1013.0    91.95  1446.33      1538.28       3054.0
5    20190822  2384.0   554.0   643.87  1268.83      1912.70       2938.0
6    20190821  2089.0   927.0  1432.15   891.26      2323.41       3016.0
7    20190820  1978.0  1007.0  -367.52  -471.31      -838.83       2985.0
8    20190819  2075.0  1395.0  3861.04  4621.52      8482.56       3470.0
9    20190816  3811.0  1726.0  -102.61   253.84       151.23       5537.0

所讀的數據就是df.to_csv寫的，數據格式樣的，但是不行，最後嘗試將append後的數據先to_csv再read_csv，然後再drop_duplicates，驚人發現此時可以去重，不知何故，本質一定是pandas認爲這裏不存在重複被，但是爲什麼呢

2020年2月10日發現了問題所在，自問自答一下：

原來是數據類型不一樣，默認pd.read_csv會把某些數據轉換成int，float等類型，而新加的數據我都是str的，所以去重去不了，只要df1 = df1.astype(type("1"))都轉換成str再去重就OK了參考更改dataFrame數據類型https://blog.csdn.net/python_ai_road/article/details/81158376

當然也可以讀取csv的時候指定數據類型參考pandas官方說明文檔。https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

數據列太多不便指定就一律轉換成str吧

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pandas 使用DataFrame.append方法得到的數據使用DataFrame.drop_duplicates方法去重失敗，問題在數據類型不一致

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

Python 潮流週刊#55：分享 9 個高質量的技術類信息源！

Azure Virtual Network (22) 多訂閱使用Azure DNS解析問題 Windows Azure Platform 系列文章目錄

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

matlab 數據存儲是列優先的，看圖

pandas 使用DataFrame.append方法得到的數據使用DataFrame.drop_duplicates方法去重失敗，問題在數據類型不一致

轉載（親測）ssh 連不上Socket error Event: 32 Error: 10053.

KEIL中同時具有51和ARM--轉載--歸集

python3 pycharm PyQt5 pyinstaller 打包，同時安裝64位和32位python

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結