pandas重塑層次化索引

原創

小白的成长

2020-02-21 13:42

在處理數據時，我們有時需要對數據的結構進行重排，也可稱作是重塑(reshape)或者軸向旋轉(pivot)。

層次化索引爲Dataframe的數據重排提供了良好的一致性的方式。功能有二：

stack：將數據的列旋轉爲行
unstack：將數據的行旋轉爲列

看幾個簡單的例子解釋一下：

  
In [15]: data = pd.DataFrame(np.arange(6).reshape((2, 3)),
    ...: index=pd.Index(['Oh', 'Co'], name='state'),columns=pd.Index(['one', 'two', 'three'], name='number'))

In [16]: data
Out[16]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [17]: res = data.stack()  # 列索引旋轉爲行索引   得到一個Series
In [18]: res
Out[18]: 
state  number
Oh     one       0
       two       1
       three     2
Co     one       3
       two       4
       three     5
dtype: int32

In [20]: res.unstack()  # 和上面相反的操作  行索引旋轉爲列索引
Out[20]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [21]: data
Out[21]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [24]: res1 = data.unstack()  # 得到一個Series  

In [25]: res1
Out[25]: 
number  state
one     Oh       0
        Co       3
two     Oh       1
        Co       4
three   Oh       2
        Co       5
dtype: int32
In [41]: res1.stack()
---------------------------------------------------------------
AttributeError                Traceback (most recent call last)
<ipython-input-41-d2140643737a> in <module>()
----> 1 res1.stack()

D:\projects\env\Lib\site-packages\pandas\core\generic.py in __g
etattr__(self, name)
   4370             if self._info_axis._can_hold_identifiers_an
d_holds_name(name):
   4371                 return self[name]
-> 4372             return object.__getattribute__(self, name)
   4373
   4374     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'stack'
In [27]: res1.unstack()
Out[27]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

上面幾個例子可多看出，對於DataFrame，無論是unstack，還是stack，都會得到一個Series對象。

而Series對象，只有unstack方法。

unstack 和 stack 默認都是對最內層的操作，可以手動指定分層級別的編號或者名稱對其他級別進行操作：

In [63]: res.unstack(level=0)
Out[63]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

In [64]: res.unstack(level='state')
Out[64]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

在旋轉時若出現缺失值的情況，傳入 dropna=False 即可消除這種情況。

軸向旋轉

pivot(index,columns,values)：將index指定爲行索引，columns是列索引，values則是DataFrame中的值

In [77]: df = pd.DataFrame({'book':['java','java','R','R','py'
    ...: ,'py'],'info':['P','Q','P','Q','P','Q'],'val':[46,33,
    ...: 50,44,66,55]})
In [78]: df
Out[78]: 
   book info  val
0  java    P   46
1  java    Q   33
2     R    P   50
3     R    Q   44
4    py    P   66
5    py    Q   55
In [79]: df.pivot('book','info')    # book 作爲行索引， info 作爲列索引
Out[79]: 
     val
info   P   Q
book
R     50  44
java  46  33
py    66  55

pivot可以用set_index和unstack等價的實現

In [84]: df.set_index(['book','info']).unstack()
Out[84]: 
     val
info   P   Q
book
R     50  44
java  46  33
py    66  55

小白的成長

發佈了49 篇原創文章 · 獲贊 14 · 訪問量 6萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pandas重塑層次化索引

軸向旋轉

轉：requests 第三方庫文檔

pandas之數據轉換

pandas之數據聚合與分組運算

pandas之時間序列

pandas重塑層次化索引

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結