【課程2.18】 去重及替換
1.去重 .duplicated
.duplicated / .replace
s = pd.Series([1,1,1,1,2,2,2,3,4,5,5,5,5])
print(s.duplicated())
print(s[s.duplicated() == False])
print('-----')
# 判斷是否重複
# 通過布爾判斷,得到不重複的值
#通過unique也可以
sr2 = sr1.unique()#不重複的列表
sr = pd.Series(sr2)#以列表生成Series
print(sr)
s_re = s.drop_duplicates()
print(s_re)
print('-----')
# drop.duplicates移除重複
# inplace參數:是否替換原值,默認False
df = pd.DataFrame({'key1':['a','a',3,4,5],
'key2':['a','a','b','b','c']})
print(df.duplicated())#按行判斷
print(df['key2'].duplicated())
# Dataframe中使用duplicated
----------------------------------------------------------------------
0 False
1 True
2 True
3 True
4 False
5 True
6 True
7 False
8 False
9 False
10 True
11 True
12 True
dtype: bool
0 1
4 2
7 3
8 4
9 5
dtype: int64
-----
0 1
4 2
7 3
8 4
9 5
dtype: int64
-----
0 False
1 True
2 False
3 False
4 False
dtype: bool
0 False
1 True
2 False
3 True
4 False
Name: key2, dtype: bool
3.替換 .replace
s = pd.Series(list('ascaazsd'))
print(s.replace('a', np.nan))
print(s.replace(['a','s'] ,np.nan))
print(s.replace({'a':'hello world!','s':123}))
# 可一次性替換一個值或多個值
# 可傳入列表或字典
----------------------------------------------------------------------
0 NaN
1 s
2 c
3 NaN
4 NaN
5 z
6 s
7 d
dtype: object
0 NaN
1 NaN
2 c
3 NaN
4 NaN
5 z
6 NaN
7 d
dtype: object
0 hello world!
1 123
2 c
3 hello world!
4 hello world!
5 z
6 123
7 d
dtype: object