【pandas小記】pandas中的“標籤”索引與 “整數”索引

原創

2020-06-15 03:19

【Python】Pandas中的“標籤”索引與 “整數”索引

一，索引

pandas在構建Series和DataFrame時都會創建一個索引序列，類似於標籤標示每個數據，不同的是，DataFrame會有行索引和列索引。注意，這裏的索引類似於標記key，通過這個key可以定位到對應的value，也可以看做一個字典

In [93]: obj = pd.Series(np.arange(1,5),index=['one','two','three','four'])                                              
In [94]: obj                                                                                                             
Out[94]: 
one      1
two      2
three    3
four     4
dtype: int64
In [95]: obj['one']    #通過標籤key定位value                                                                                                   
Out[95]: 1

二，通過索引選取數據

1，Pandas的索引與標準Python中的索引功能類似，只不過Pandas的索引值不僅僅是整數，還可以是前面說到的標籤。

In [97]: obj[[0,1]]    #整數索引從0開始                                                                                                  
Out[97]: 
one    1
two    2
dtype: int64
In [98]: obj[['one','two']]  #軸標籤索引                                                                                            
Out[98]: 
one    1
two    2
dtype: int64
In [100]: obj[[-1,-3]]       #負整數索引，從-1開始                                                                                                
Out[100]: 
four    4
two     2
dtype: int64
In [101]: obj[['four','two']]                                                                                            
Out[101]: 
four    4
two     2
dtype: int64

整數索引與標準Python的一樣都是從0開始，負整數索引從-1開始。除此之外Pandas中的索引也支持切片。

In [105]: a = pd.DataFrame(np.arange(16).reshape(4,4),index=['a','b','c','d'],columns=['one','two','three','four'])      
In [106]: a                                                                                                              
Out[106]: 
   one  two  three  four
a    0    1      2     3
b    4    5      6     7
c    8    9     10    11
d   12   13     14    15
In [107]: a[:2]                                                                                                          
Out[107]: 
   one  two  three  four
a    0    1      2     3
b    4    5      6     7
In [108]: a[:'c']                                                                                                        
Out[108]: 
   one  two  three  four
a    0    1      2     3
b    4    5      6     7
c    8    9     10    11

可以看出區別，用整數索引進行切片時，左邊等於右邊不等於，即是0<= index < 2，而使用標籤索引則是’a’<= index<=‘c’。 所以需要注意這兩種類型的索引使用。

2，DataFrame還可以使用loc和iloc來選擇數據，而兩者是根據“標籤”或“整數”索引來選擇的，

In [112]: a.iloc[:2,:]   #根據整數索引選擇數據                                                                                                
Out[112]: 
   one  two  three  four
a    0    1      2     3
b    4    5      6     7

In [113]: a.loc[:'c',:]       #根據標籤選擇數據                                                                                           
Out[113]: 
   one  two  three  four
a    0    1      2     3
b    4    5      6     7
c    8    9     10    11

loc 和 iloc將標籤索引與整數標籤區別使用，這要可以避免造成錯誤。其實在它們之前還有一個ix，可以將這兩種標籤混合使用。

In [114]: a.ix[:'c',:2]                                                                                                  
/usr/local/bin/ipython:1: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  #!/usr/local/bin/python3.6
/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py:822: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
Out[114]: 
   one  two
a    0    1
b    4    5
c    8    9

ix 被定義成 deprecated，大概是因爲可以混合label和position導致了很多用戶問題和bug。所以用標籤時就loc，用整數時則用iloc，避免造成歧義。

3，將標籤設置爲整數時，也會造成歧義。

In [122]: b = pd.DataFrame(np.arange(16).reshape(4,4),index=[0,1,2,3],columns=['one','two','three','four'])              
In [123]: b                                                                                                              
Out[123]: 
   one  two  three  four
0    0    1      2     3
1    4    5      6     7
2    8    9     10    11
3   12   13     14    15
In [124]: b[:2]       #整數  or 標籤？                                                                                                   
Out[124]: 
   one  two  three  four
0    0    1      2     3
1    4    5      6     7

In [125]: b.loc[:2]                                                                                                      
Out[125]: 
   one  two  three  four
0    0    1      2     3
1    4    5      6     7
2    8    9     10    11

In [126]: b.iloc[:2]                                                                                                     
Out[126]: 
   one  two  three  four
0    0    1      2     3
1    4    5      6     7

上面的例子中，b數組使用整數作爲行標籤，那麼就造成歧義了。

b[:2]       #這裏是使用整數還是標籤？像是使用整數索引

所以，如果使用整數作爲標籤，那麼在數據選擇是使用loc 或者 iloc，這樣可以明確知道使用的是哪種類型的索引，從而精確地選擇數據。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【pandas小記】pandas中的“標籤”索引與 “整數”索引

【Python】Pandas中的“標籤”索引與 “整數”索引

一，索引

二，通過索引選取數據

【Oracle】淺析遊標使用

【Oracle】深入多表連接

【Python】NumPy 中 ravel() 正確打開方式

【pandas小記】pandas日期類型數據處理

【pandas小記】pandas中易混淆的描述性統計

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【pandas小記】pandas中的“標籤”索引 與 “整數”索引

【Python】Pandas中的“標籤”索引 與 “整數”索引

一，索引

二，通過索引選取數據

【pandas小記】pandas中的“標籤”索引與 “整數”索引

【Python】Pandas中的“標籤”索引與 “整數”索引