【pandas小記】pandas中 map、apply、applymap和transform詳解

原創

2020-06-15 03:19

（一）pandas.Series.map

Series.map(self, arg, na_action=None)
"""
根據輸入對應關係映射序列值，用於用另一個值替換序列中的每個值。
map()是Series對象的一個函數，DataFrame中沒有map()，map()的功能是將一個自定義函數作用於Series對象的每個元素

注意：Series對象、映射、替換、每個值
"""

#參數
"""
arg：映射對應關係，function, collections.abc.Mapping subclass or Series
na_action：是否忽略NAN?{None, ‘ignore’}, 默認None，若改爲‘ignore’，則序列中的NAN不進行映射。
"""

# demo
data = {'ani“”mal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', np.nan, 'no', 'yes', np.nan, 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data=data, index=labels)
df['priority'] = df['priority'].map({'no':False,'yes':True})   # 通過字典映射
df['age'] = df['age'].map(lambda x : "%.3f"%x)  # 通過函數進行映射
"""
  animal    age  visits priority
a    cat  2.500       1     True
b    cat  3.000       3      NaN
c  snake  0.500       2    False
"""

df['animal'] = df['animal'].replace('cat','dog')  # 不對所有值進行替換，替換特定值，與map不同！！！
"""
  animal    age  visits priority
a    dog  2.500       1     True
b    dog  3.000       3      NaN
c  snake  0.500       2    False
"""

（二）pandas.DataFrame.apply

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)
"""
將一個自定義函數作用於DataFrame的行或者列，也可以作用於Series。

注意：自定義函數、DataFrame行或列、Series
"""

# 參數
"""
func：自定義函數
axis：函數應用的軸向，默認axis=0
raw：確定行或列作爲Series，還是ndarray對象傳遞？默認爲False，作爲Series對象傳遞給函數，如果自定義函數是numpy相關，raw=True，將作爲ndarray對象
result_type：默認爲None，只有針對於axis=1的情況，如果函數返回一個列表，則結果爲列表，如果返回一個Series,則結果爲Series
args：tuple，參數元組傳遞給函數
**kwds：關鍵字參數傳遞給函數
"""

# demo
# 計算兩個日期之間的天數
def get_Interval(row):
    return row['ReceivedDate'] - row['PublishedDate']


def DateInterval(row, before, after):
    return row[before] - row[after]

def get_yearmonth(data):
    return data.strftime('%Y-%m')


published_date = pd.date_range(start='2020-03-01', end='2020-03-31', freq='B')
received_date = pd.date_range(start='2020-04-01', end='2020-04-30', freq='B')
df = pd.DataFrame({'PublishedDate': published_date, 'ReceivedDate': received_date})
# 使用lambda  
df['DateInterval'] = df.apply(lambda x : x['ReceivedDate'] - x['PublishedDate'],axis=1)
# 調用函數
# df['DateInterval'] = df.apply(get_Interval,axis=1)

# args傳入參數 apply會往DateInterval中傳入一個參數，所以DateInterval接收三個參數
# df['DateInterval'] = df.apply(DateInterval, axis=1, args=('ReceivedDate', 'PublishedDate'))

# **kwds傳入參數
# df['DateInterval'] = df.apply(DateInterval, axis=1, before = 'ReceivedDate', after='PublishedDate')

# Series也可以調用,但是沒法使用參數axis
# df['ReceivedDate_ym'] = df['ReceivedDate'].apply(lambda x : x.strftime('%Y-%m'))
# df['ReceivedDate_ym'] = df['ReceivedDate'].apply(lambda x : x.strftime('%Y-%m'),axis=1)  報錯
# df['ReceivedDate_ym'] = df['ReceivedDate'].apply(get_yearmonth)
# df['ReceivedDate_ym'] = df['ReceivedDate'].apply(get_yearmonth,axis=1) 報錯
print(df)

（三）pandas.DataFrame.applymap

DataFrame.applymap(self, func)
"""
函數可以對DataFrame裏的每個值進行處理,然後返回一個新的DataFrame
注意：作用於每個元素，並返回對應的結果
"""

# 參數
"""
func：函數
"""

#demo
def add_one(x):
    return x + 1

df = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [10, 20, 30],
    'c': [5, 10, 15]
})
# 調用函數
print(df.applymap(add_one))
print(df+1)

# lambda
print(df.applymap(lambda x :x**2))
print(df**2)

（四）pandas.DataFrame.transform

DataFrame.transform(self, func, axis=0, *args, **kwargs)
"""
在dataframe上調用函數，生成的dataframe與原dataframe有相同的軸長度
key：沿軸長度相同
"""

# 參數
"""
func：用於轉換的函數，可以接受以下的情況：
1，自定義函數
2，函數名
3，函數名列表
4，軸標籤與函數名的字典
axis：函數應用的軸向，默認axis=0
"""

# demo
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1]}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data=data, index=labels)


print(df.groupby(['animal']).apply(lambda x:x.mean()))
"""
apply返回聚合後的結果
        age  visits
animal             
cat     2.5     2.0
dog     5.0     2.0
snake   2.5     1.5
"""
print(df.groupby(['animal']).transform(lambda x:x.mean()))
"""
transform 返回的shape是（len(df)，1）。注：如果與groupby()方法聯合使用，需要對值進行去重
   age  visits
a  2.5     2.0
b  2.5     2.0
c  2.5     1.5
d  5.0     2.0
e  5.0     2.0
f  2.5     2.0
g  2.5     1.5
h  2.5     2.0
i  5.0     2.0
j  5.0     2.0
"""
print(df.groupby(['animal']).apply(lambda x:x['age'] - x['visits']))
# transform 只允許在同一時間在一個Series上進行一次轉換，如果定義列‘a’ 減去列‘b’，  則會出現異常；
print(df.groupby(['animal']).transform(lambda x:x['age'] - x['visits']))

# 函數名列表
print(df['visits'].transform([np.sqrt,np.exp]))

# 軸標籤:函數名字典
print(df[['age','visits']].transform({'age':np.sqrt,'visits':np.exp}))

（五）總結

1，map，apply，applymap

map()是pandas.series.map()方法，對每個值進行映射操作。
apply()是DF的方法, 對DF中的數據按行/列應用func操作，也可以單獨對Series應用func操作。
applymap()也是DF的方法, 對整個DF所有元素應用func操作。

2，apply()與 transform()

transform只允許在同一時間在一個Series上進行一次轉換，如果定義列‘a’ 減去列‘b’，則會出現異常；
transform返回的shape是（len(df)，1）。注：如果與groupby()方法聯合使用，需要對值進行去重；
不同於transform只允許在Series上進行一次轉換，apply對整個DataFrame 作用；
apply不能直接通過函數名直接調用，而transform可以用函數名調用，也可以對不同標籤進行不同計算；

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【pandas小記】pandas中 map、apply、applymap和transform詳解

（一）pandas.Series.map

（二）pandas.DataFrame.apply

（三）pandas.DataFrame.applymap

（四）pandas.DataFrame.transform

（五）總結

EXCEL中下拉菜單中添加新選項或者刪除選項

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

Git使用經驗總結5-修改提交信息

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

Git使用經驗總結4-撤回上一次本地提交

Java中止線程的方式

壓榨數據庫的真實處理速度

[轉帖]Oracle Exadata 學習筆記之核心特性Part1

【Oracle】淺析遊標使用

【Oracle】深入多表連接

【Python】NumPy 中 ravel() 正確打開方式

【pandas小記】pandas日期類型數據處理

【pandas小記】pandas中易混淆的描述性統計

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結