1.Pandas基礎，Series，DataFrame

原創

Pluto0054

2020-06-14 02:30

文章目錄

0 引言

Pandas是基於Numpy的一種工具，主要是爲了解決數據分析任務，Pandas主要有以下幾種數據結構：

Series：一維數組，與Numpy中的一維array類似，二者與Python基本的數據結構List也很相近；
Time-Series：以時間爲索引的Series；
DataFrame：二維的表格型數據結構，可以理解成DataFrame理解爲Series的容器；
Panel：三維的數組，可以理解爲DataFrame的容器；
Panel4D：4維數據容器；
PanelND：擁有factory集合，可以創建Panel4D一樣N維命名容器的模塊；

1 Series

import pandas as pd
import numpy as np

首先構建一個Series，如下，會自動添加索引，從0開始

s1 = pd.Series([4, 7, -5, 3])
print(s1)

0    4
1    7
2   -5
3    3
dtype: int64

查看Series的值

s1.values

array([ 4,  7, -5,  3], dtype=int64)

查看Series的索引

s1.index

RangeIndex(start=0, stop=4, step=1)

可以給Series指定索引

s2 = pd.Series([4.0, 6.5, -0.5, 4.2], index = ['d', 'b', 'a', 'c'])
print(s2)

d    4.0
b    6.5
a   -0.5
c    4.2
dtype: float64

通過索引取值

s2["a"]

-0.5

用 in 判斷索引是否在Series中

'b' in s2

True

有序字典構建Series

# Series可以看成一個定長的有序字典
dic1 = {'apple':5, 'pen':3, 'applepen':10}
s3 = pd.Series(dic1)
print(s3)

apple        5
pen          3
applepen    10
dtype: int64

2 DataFrame

首先創建一個DataFrame，通過字典構建，也會自動添加索引，從0開始

# DataFrame
data = {'year':[2014,2015,2016,2017],
        'income':[10000,30000,50000,80000],
        'play':[5000,20000,30000,30000]}
df1 = pd.DataFrame(data)
df1

	year	income	play
0	2014	10000	5000
1	2015	30000	20000
2	2016	50000	30000
3	2017	80000	30000

指定行和列來構建DataFrame，自動構建表頭和索引

df2 = pd.DataFrame(np.arange(12).reshape((3,4)))
df2

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

指定行和列來構建DataFrame，指定表頭 culumn 和索引 index

df3 = pd.DataFrame(np.arange(12).reshape((3,4)), index=['a','c','b'], columns=[2,33,44,5])
df3

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

調用列column

df1.columns # 列

Index(['year', 'income', 'play'], dtype='object')

調用行index

df1.index #行

RangeIndex(start=0, stop=4, step=1)

查看值values

df1.values

array([[ 2014, 10000,  5000],
       [ 2015, 30000, 20000],
       [ 2016, 50000, 30000],
       [ 2017, 80000, 30000]], dtype=int64)

可以調用 .describe 描述表，可以得出平均值，標準差等一些表的屬性

df1.describe()

	year	income	play
count	4.000000	4.000000	4.000000
mean	2015.500000	42500.000000	21250.000000
std	1.290994	29860.788112	11814.539066
min	2014.000000	10000.000000	5000.000000
25%	2014.750000	25000.000000	16250.000000
50%	2015.500000	40000.000000	25000.000000
75%	2016.250000	57500.000000	30000.000000
max	2017.000000	80000.000000	30000.000000

使用 .T 轉置，行變成列，列變成行

df1.T

	0	1	2	3
year	2014	2015	2016	2017
income	10000	30000	50000	80000
play	5000	20000	30000	30000

df3

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

列排序

df3.sort_index(axis=1) # 列排序

	2	5	33	44
a	0	3	1	2
c	4	7	5	6
b	8	11	9	10

行排序

df3.sort_index(axis=0) # 行排序

	2	33	44	5
a	0	1	2	3
b	8	9	10	11
c	4	5	6	7

指定列排序

df3.sort_values(by=44)

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

1.Pandas基礎，Series，DataFrame

文章目錄

0 引言

1 Series

2 DataFrame

Learn_Python_面向對象15

leetcode2.兩數相加（數組&鏈表_中等）

Learn_Python_文件/IO、File方法和OS方法14

1.Pandas基礎，Series，DataFrame

4.Matplotlib繪圖之scatter散點圖，bar直方圖，contours等高線圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結