pandas 輕鬆實現數據類型轉化

原創

2020-05-17 11:32

首先，瞭解一下pandas的數據類型：

Pandas dtype	Python type	NumPy type	Usage
object	str or mixed	string_, unicode_, mixed types	Text or mixed numeric and non-numeric values
int64	int	int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64	Integer numbers
float64	float	float_, float16, float32, float64	Floating point numbers
bool	bool	bool_	True/False values
datetime64	NA	datetime64[ns]	Date and time values
timedelta[ns]	NA	NA	Differences between two datetimes
category	NA	NA	Finite list of text values

Notes:object在加載數據時可以處理任何其他數據類型，包括字符串，所以在pandas新版本1.0.0中新增了一數據類型，StringDtype,，專用來處理字符串。也算是一個改進。

兩個重要的數據轉化方法
1、astype

DataFrame.astype(dtype, copy = True, errors = 'raise') 
or
Series.astype(dtype, copy = True, errors = 'raise')

上述方法可以將一類數據轉換類型或者傳入一個dict，列爲key，需要轉化的數據類型爲value。

2、convert_dtypes

DataFrame.convert_dtypes(infer_objects = True, convert_string:  True, convert_integer = True, convert_boolean = True)
or
Series.convert_dtypes(infer_objects = True, convert_string:  True, convert_integer = True, convert_boolean = True)

convert_dtypes可以自動推斷數據類型並進行轉化。個人感覺，這個方法只在識別string上智能，在int推斷時還是會儘可能的選擇大高位存儲，int還是以int64爲主，內存消耗還是很大。
舉例：

df = pd.DataFrame({'a':[1,2,3],'b':[0.55,0.66,1.55],'c':['Jack','Tony','Posi']})
df.dtypes
a      int64
b    float64
c     object
dtype: object

df['a'] = df['a'].astype(np.int32)
df.dtypes
a      int32
b    float64
c     object
dtype: object

df.convert_dtypes().dtypes
a      Int64
b    float64
c     string
dtype: object

參考：
https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pandas 輕鬆實現數據類型轉化

Nginx R31 doc 官方文檔-01-nginx 如何安裝

centos yum安裝mongoDB，簡明清晰

anaconda jupyter notebook切換Python環境

bash: ./game.sh: /bin/bash^M: 解釋器錯誤: 沒有那個文件或目錄

python pandas數據類型與佔用內存--優化

python for循環優化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結