pandas

Series （一維數組）

測試數據

import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# 創建Series對象並省略索引 一維數據
'''
index 參數是可省略的，你可以選擇不輸入這個參數。
如果不帶 index 參數，Pandas 會自動用默認 index 進行索引，類似數組，索引值是 [0, ..., len(data) - 1]
'''
sel =  Series([1,2,3,4])
print(sel)

0    1
1    2
2    3
3    4
dtype: int64

自定義創建索引 index=list 下面兩個例子效果是一樣的

sel =  Series(data = [1,2,3,4], index = list('abcd'))
print (sel)
print ('*'*20)
sel =  Series(data = [1,2,3,4], index = ['a','b','c','d'])
print (sel)

a    1
b    2
c    3
d    4
dtype: int64
********************
a    1
b    2
c    3
d    4
dtype: int64

獲取相對應的內容

獲取內容


print(sel.values)
['xiaohai' 'lilei' 'hanmeimei' 'dlay']

獲取索引

print(sel.index)
Index(['d', 'c', 'b', 'a'], dtype='object')

獲取索引和值對

print(list(sel.iteritems()))
[('d', 'xiaohai'), ('c', 'lilei'), ('b', 'hanmeimei'), ('a', 'dlay')]

Series對象同時支持位置和標籤兩種方式獲取數據

索引下標就是自定義索引的值來獲取對應數據

print('索引下標',sel['d'])
索引下標 xiaohai

位置下標就是默認從0開始的索引值來獲取數據

print('位置下標',sel[1])
位置下標 lilei

獲取不連續的數據

#(display現在可以理解爲同print顯示一樣 但效果更清楚一些)
display('索引下標',sel[['a','c']])
print ('*'*20)
print('位置下標',sel[[1,3]])

'索引下標'
a     dlay
c    lilei
dtype: object
********************
位置下標 c    lilei
a     dlay
dtype: object

使用切片或取數據

# 索引切片 左右都包含
display('索引切片',sel['b':'d'])
print ('*'*20)
#位置下標切片同list差不多 左包含右不包含
print('位置切片',sel[1:3])

'索引切片'
Series([], dtype: object)
********************
位置切片 c        lilei
b    hanmeimei
dtype: object

重新賦值索引的值（ReIndex）

sel.index = list('dcba')
print(sel)
print ('*'*20)
# ReIndex重新索引,會返回一個新的Series 如果值爲int會自動轉成float64 如果字符轉成object
# 利用reindex方法 重新設置索引 會根據以前索引按當前list索引排序 如果有值則賦值 無值則用NaN
print(sel.reindex(['b','a','c','d','e']))  #排序規則 reindex

d      xiaohai
c        lilei
b    hanmeimei
a         dlay
dtype: object
********************
b    hanmeimei
a         dlay
c        lilei
d      xiaohai
e          NaN
dtype: object

刪除行、刪除列 axis = 0 爲行（默認0） 1爲列 (Drop)

se1=pd.Series(range(10,15))
print(se1)
print ('*'*20)
print(se1.drop([2,3]))

0    10
1    11
2    12
3    13
4    14
dtype: int64
********************
0    10
1    11
4    14
dtype: int64

算術運算(基於 index 進行的)

介紹

'''
對 Series 的算術運算都是基於 index 進行的。
我們可以用加減乘除（+ - * /）這樣的運算符對兩個 Series 進行運算，
Pandas 將會根據索引 index，對響應的數據進行計算，結果將會以浮點數的形式存儲，以避免丟失精度。
如果 Pandas 在兩個 Series 裏找不到相同的 index，對應的位置就返回一個空值 NaN
'''

測試數據

series1 = pd.Series([1,2,3,4],['London','HongKong','Humbai','lagos'])
series2 = pd.Series([1,3,6,4],['London','Accra','lagos','Delhi'])
print(series1)
print ('*'*20)
print(series2)

London      1
HongKong    2
Humbai      3
lagos       4
dtype: int64
********************
London    1
Accra     3
lagos     6
Delhi     4
dtype: int64

相加

print(series1+series2)
Accra        NaN
Delhi        NaN
HongKong     NaN
Humbai       NaN
London       2.0
lagos       10.0
dtype: float64

相減

print(series1-series2)
Accra       NaN
Delhi       NaN
HongKong    NaN
Humbai      NaN
London      0.0
lagos      -2.0
dtype: float64

相乘

print(series1*series2)
Accra        NaN
Delhi        NaN
HongKong     NaN
Humbai       NaN
London       1.0
lagos       24.0
dtype: float64

DataFrame （多維數組）

方法介紹

# DataFrame參數一：數據源 幾行幾列 
# index 索引值
# columns 列值

測試數據

df1 = DataFrame(np.random.randint(0,10,(4,4)),index=[1,2,3,4],columns=['a','b','c','d'])
print(df1)

   a  b  c  d
1  0  2  7  3
2  3  1  2  9
3  8  8  8  1
4  2  0  5  9

字典轉DataFrame

默認字典轉換DataFrame (行索引由index決定，列索引由字典的鍵決定)

dict={
    'Province': ['Guangdong', 'Beijing', 'Qinghai', 'Fujian'],
    'pop': [1.3, 2.5, 1.1, 0.7],
    'year': [2018, 2018, 2018, 2018]}
df2=pd.DataFrame(dict)  # 沒有定義索引值 則是字典index值 也就是0開頭
print(df2)
print ('*'*20)
df2=pd.DataFrame(dict,index=[1,2,3,4]) # 此處定義索引值 則按自定義值顯示
print (df2)

    Province  pop  year
0  Guangdong  1.3  2018
1    Beijing  2.5  2018
2    Qinghai  1.1  2018
3     Fujian  0.7  2018
********************
    Province  pop  year
1  Guangdong  1.3  2018
2    Beijing  2.5  2018
3    Qinghai  1.1  2018
4     Fujian  0.7  2018

也可以使用from_dict （目前還不知道與上面有什麼區別）

dict2={"a":[1,2,3],"b":[4,5,6]}
df6=pd.DataFrame.from_dict(dict2)
print(df6)

   a  b
0  1  4
1  2  5
2  3  6

索引相同的情況下，相同索引的值會相對應，缺少的值會添加NaN

data = {
    'Name':pd.Series(['zs','ls','we'],index=['a','b','c']),
    'Age':pd.Series(['10','20','30','40'],index=['a','b','c','d']),
    'country':pd.Series(['中國','日本','韓國'],index=['a','c','b'])
}
df = pd.DataFrame(data)  
print(df)

 Name Age country
a   zs  10      中國
b   ls  20      韓國
c   we  30      日本
d  NaN  40     NaN

to_dict()方法將DataFrame對象轉換爲字典

dict = df.to_dict()
print(dict)
{'Name': {'a': 'zs', 'b': 'ls', 'c': 'we', 'd': nan}, 'Age': {'a': '10', 'b': '20', 'c': '30', 'd': '40'}, 'country': {'a': '中國', 'b': '韓國', 'c': '日本', 'd': nan}}

dataframe常用屬性

測試數據

df_dict = {
    'name':['James','Curry','Iversion'],
    'age':[18,20,19],
    'national':['us','China','us']
}
df = pd.DataFrame(data=df_dict,index=['0','1','2'])
print(df)

       name  age national
0     James   18       us
1     Curry   20    China
2  Iversion   19       us

獲取行數和列數

print(df.shape)
(3, 3)

獲取行索引

print(df.index.tolist())
['0', '1', '2']

獲取列索引

print(df.columns.tolist())
['name', 'age', 'national']

獲取數據的類型

print(df.dtypes)
name        object
age          int64
national    object
dtype: object

獲取數據的維度

print(df.ndim)
2

獲取數據值

# values屬性也會以二維ndarray的形式返回DataFrame的數據
print(df.values)
[['James' 18 'us']
 ['Curry' 20 'China']
 ['Iversion' 19 'us']]

展示df的概覽該方法沒有返回值如果print 則爲None

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 0 to 2
Data columns (total 3 columns):
name        3 non-null object
age         3 non-null int64
national    3 non-null object
dtypes: int64(1), object(2)
memory usage: 96.0+ bytes

顯示頭幾行,默認顯示5行

print(df.head(2))

    name  age national
0  James   18       us
1  Curry   20    China

顯示後幾行

print(df.tail(1))

       name  age national
2  Iversion   19       us

獲取DataFrame的指定列只獲取一列返回的就是一個 Series

display(df['name'])
print(type(df['name']))

0       James
1       Curry
2    Iversion
Name: name, dtype: object
<class 'pandas.core.series.Series'>

如果獲取多個列，那返回的就是一個 DataFrame 類型

print(df[['name','age']])
print(type(df[['name','age']]))

       name  age
0     James   18
1     Curry   20
2  Iversion   19
<class 'pandas.core.frame.DataFrame'>

獲取一行

print(df[0:1])

    name  age national
0  James   18       us

獲取多行

print(df[1:3])

       name  age national
1     Curry   20    China
2  Iversion   19       us

獲取多行裏面的某一列（可以進行多行多列的選擇）

print(df[1:3][['name','age']])
       name  age
1     Curry   20
2  Iversion   19

獲取行數據

loc iloc 介紹

'''
df.loc 通過標籤索引獲取行數據
df.iloc 通過位置索引獲取行數據
'''

測試數據

df_dict = {
    'name':['James','Curry','Iversion'],
    'age':[18,20,19],
    'national':['us','China','us']
}
df = pd.DataFrame(data=df_dict,index=['0','1','2'])
print(df)

       name  age national
0     James   18       us
1     Curry   20    China
2  Iversion   19       us

loc 獲取某一行某一列的數據

print(df.loc['0','name'])
James

loc 一行所有列 (行,列)

print(df.loc['0',:])
name        James
age            18
national       us
Name: 0, dtype: object

loc 某一行多列的數據

print(df.loc['0',['name','age']])
name    James
age        18
Name: 0, dtype: object

loc 選擇間隔的多行多列

print(df.loc[['0','2'],['name','national']])
       name national
0     James       us
2  Iversion       us

loc 選擇連續的多行和間隔的多列 ----這裏注意是連連續

print(df.loc['0':'2',['name','national']])
       name national
0     James       us
1     Curry    China
2  Iversion       us

iloc 取一行

print(df.iloc[1])
name        Curry
age            20
national    China
Name: 1, dtype: object

iloc 取連續多行

print(df.iloc[0:2])
    name  age national
0  James   18       us
1  Curry   20    China

iloc 取間斷的多行

print(df.iloc[[0,2],:])
       name  age national
0     James   18       us
2  Iversion   19       us

iloc 取某一列

print(df.iloc[:,1])
0    18
1    20
2    19
Name: age, dtype: int64

iloc 某一個值

print(df.iloc[1,0])
Curry

修改值

df.iloc[0,0]='panda'
print(df)
       name  age national
0     panda   18       us
1     Curry   20    China
2  Iversion   19       us

排序方法 ascending=False ：降序排列，默認是升序

df = df.sort_values(by='age',ascending=False)
print(df)
       name  age national
1     Curry   20    China
2  Iversion   19       us
0     James   18       us

修改列行索引(如都加_ABC)

測試數據

# 創建DataFrame對象 index爲行索引 columns爲列索引
df1 = pd.DataFrame(np.arange(9).reshape(3, 3), index = ['bj', 'sh', 'gz'], columns=['a', 'b', 'c'])
print(df1)
    a  b  c
bj  0  1  2
sh  3  4  5
gz  6  7  8

自定義函數修改

# 自定義map函數（x是原有的行列值）
def test_map(x):
    return x+'_ABC'
# inplace：布爾值，默認爲False。指定是否返回新的DataFrame。如果爲True，則在原df上修改，返回值爲None。
print(df1.rename(index=test_map, columns=test_map, inplace=False))

        a_ABC  b_ABC  c_ABC
bj_ABC      0      1      2
sh_ABC      3      4      5
gz_ABC      6      7      8

rename修改

# 同時，rename 還可以傳入字典，爲某個 index 單獨修改名稱
df3 = df1.rename(index={'bj':'beijing'}, columns = {'a':'aa'}) 
print(df3)

         aa  b  c
beijing   0  1  2
sh        3  4  5
gz        6  7  8

行列值轉化索引

測試數據

df1=pd.DataFrame({'X':range(5),'Y':range(5),'S':list("abcde"),'Z':[1,1,2,2,2]})
print(df1)
   X  Y  S  Z
0  0  0  a  1
1  1  1  b  1
2  2  2  c  2
3  3  3  d  2
4  4  4  e  2

自定義行索引爲數據中其中一列 — 位置索引還在 (drop=False 指定同時保留作爲索引的列)

result = df1.set_index('S',drop=False)
print(result)
print ('*'*20)
result.index.name=None
print(result)

X  Y  S  Z
S            
a  0  0  a  1
b  1  1  b  1
c  2  2  c  2
d  3  3  d  2
e  4  4  e  2
********************
   X  Y  S  Z
a  0  0  a  1
b  1  1  b  1
c  2  2  c  2
d  3  3  d  2
e  4  4  e  2

行轉爲列索引

result = df1.set_axis(df1.iloc[0],axis=1,inplace=False)
display(result)
print ('*'*20)
result.columns.name=None
display(result)

	0	0	a	1
0	0	0	a	1
1	1	1	b	1
2	2	2	c	2
3	3	3	d	2
4	4	4	e	2
********************
0	0	a	1
0	0	0	a	1
1	1	1	b	1
2	2	2	c	2
3	3	3	d	2
4	4	4	e	2

ywmack

發佈了64 篇原創文章 · 獲贊 3 · 訪問量 4萬+

私信關注

數據分析_第三天_pandas

pandas

Series （一維數組）

測試數據

自定義創建索引 index=list 下面兩個例子效果是一樣的

獲取相對應的內容

獲取內容

獲取索引

獲取索引和值對

Series對象同時支持位置和標籤兩種方式獲取數據

索引下標就是自定義索引的值來獲取對應數據

位置下標就是默認從0開始的索引值來獲取數據

獲取不連續的數據

使用切片或取數據

重新賦值索引的值（ReIndex）

刪除行、刪除列 axis = 0 爲行（默認0） 1爲列 (Drop)

算術運算(基於 index 進行的)

介紹

測試數據

相加

相減

相乘

DataFrame （多維數組）

方法介紹

測試數據

字典轉DataFrame

默認字典轉換DataFrame (行索引由index決定，列索引由字典的鍵決定)

也可以使用from_dict （目前還不知道與上面有什麼區別）

索引相同的情況下，相同索引的值會相對應，缺少的值會添加NaN

to_dict()方法將DataFrame對象轉換爲字典

dataframe常用屬性

測試數據

獲取行數和列數

獲取行索引

獲取列索引

獲取數據的類型

獲取數據的維度

獲取數據值

展示df的概覽 該方法沒有返回值 如果print 則爲None

顯示頭幾行,默認顯示5行

顯示後幾行

獲取DataFrame的指定列 只獲取一列返回的就是一個 Series

如果獲取多個列，那返回的就是一個 DataFrame 類型

獲取一行

獲取多行

獲取多行裏面的某一列（可以進行多行多列的選擇）

獲取行數據

loc iloc 介紹

測試數據

loc 獲取某一行某一列的數據

loc 一行所有列 (行,列)

loc 某一行多列的數據

loc 選擇間隔的多行多列

loc 選擇連續的多行和間隔的多列 ----這裏注意 是連連續

iloc 取一行

iloc 取連續多行

iloc 取間斷的多行

iloc 取某一列

iloc 某一個值

修改值

排序方法 ascending=False ： 降序排列，默認是升序

修改列行索引(如 都加_ABC)

測試數據

自定義函數修改

rename修改

行列值 轉化索引

測試數據

自定義行索引爲數據中其中一列 — 位置索引還在 (drop=False 指定同時保留作爲索引的列)

行轉爲列索引

展示df的概覽該方法沒有返回值如果print 則爲None

獲取DataFrame的指定列只獲取一列返回的就是一個 Series

loc 選擇連續的多行和間隔的多列 ----這裏注意是連連續

排序方法 ascending=False ：降序排列，默認是升序

修改列行索引(如都加_ABC)

行列值轉化索引