numpy概述

Numerical Python(數值python). 補充了python欠缺的數值運算能力.
Numpy是其他數據分析及機器學習的底層庫.
Numpy完全標準C語言實現, 運行效率高.
開源免費.

numpy的歷史

1995年, 發佈Numeric python.
2001年, Scipy 提供 Numarray. (提供了多維數組)
2005年, Numeric + Numarray -> Numpy.
2006年, Numpy脫離Scipy成爲獨立的項目.

numpy的核心:多維數組

import numpy as np
ary = np.array([1, 2, 3, 4])
print(ary)

numpy基礎

ndarray數組

import numpy as np
ary = np.array([1, 2, 3, 4])
print(ary)

內存中的ndarray對象

元數據(metadata)

存儲對目標數組的描述信息,如: 維度 / 元素類型等等.

實際數據

存儲完整的數組數據.

將實際數據與元數據分開存放,一方面提高了內存空間的使用效率,另一方面減少對實際數據的訪問頻率,提高性能.

ndarray數組對象的特點

ndarray數組是同質數組, 即所有元素的數據類型必須相同.

ndarray數組對象的創建

np.array(任何可被解釋爲數組的邏輯結構)

np.arange(起始值[0], 終止值, 步長[1])

np.zeros(數組元素個數, dtype='元素類型')

np.ones(數組元素個數, dtype='元素類型') 

np.zeros_like(ary)

np.ones_like(ary)

案例:

"""
demo02_ndarray.py
"""
import numpy as np
a = np.array([[1, 2, 3, 4], 
		      [5, 6, 7, 8]])
print(a, a.shape)
# 起始值1, 終止值10, 步長1
b = np.arange(1, 10, 2)
print(b)

# 創建5個元素全爲0的數組
c = np.zeros(5, dtype='int32')
print(c, c.dtype)

# 創建5個元素全爲1的數組
d = np.ones(5, dtype='int32')
print(d, d.dtype)
# 創建數組e與f, 結構與a相同, e中全0, f中全1
e = np.zeros_like(a)
f = np.ones_like(a)
print(e)
print(f / 5)

ndarray對象屬性的基本操作

數組的維度: ndarray.shape

元素的類型: ndarray.dtype

數組元素的個數: ndarray.size len(ndarray)

數組元素的索引(下標): ary[0]

"""
demo03_attr.py
"""
import numpy as np

# 測試數組的維度
a = np.arange(1, 10)
print(a, a.shape)
a.shape = (3, 3)
print(a, a.shape)

# 測試元素的類型
print(a.dtype)
b = a.astype(float)
print(b, b.dtype)

b[0][0] = 999
print(b)
print(a)

# 測試元素的個數
print('a.size:', a.size, 'len(a):', len(a))

# 數組元素的索引
c = np.arange(1, 19).reshape(3, 2, 3)
print(c)
print(c[0])
print(c[0][0])
print(c[0][0][0])
print(c[0, 0, 0])

# 遍歷c中的每個元素並輸出
for i in range(c.shape[0]):
	for j in range(c.shape[1]):
		for k in range(c.shape[2]):
			print(c[i,j,k], end=' ')

ndarray對象屬性操作詳解

內部基本數據類型

類型名類型表示符
布爾型 bool_
有符號整數型 int8(-128~127) / int16 / int32 / int64
無符號整數型 uint8 / uint16 / uint32 / uint64
浮點型 float16 / float32 / float64
複數型 complex64 / complex128
字串型 str_

ndarray數組中存儲自定義複合類型數據

"""
demo03_ctype.py 測試自定義複合類型
"""
import numpy as np

data = [
	('zs', [90, 80, 70], 15),
	('ls', [86, 76, 69], 16),
	('ww', [22, 11, 34], 17)]

# 第一種設置dtype屬性的方式
# U3:     3個Unicode字符 
# 3int32: 3個int32整數 (列表)
# int32:  1個int32整數
a = np.array(data, dtype='U3, 3int32, int32')
print(a)
# 獲取第三個用戶的姓名  'f0':第一個字段
print(a[2]['f0'])

# 第二種設置dtype屬性的方式
b = np.array(data, dtype=[
				('name',   'str_',  2),
				('scores', 'int32', 3),
				('age',    'int32', 1)])
print(b)
print(b[1]['scores'])

# 第三種設置dtype的方式
c = np.array(data, dtype={
		'names':['name', 'scores', 'age'],
		'formats':['U3', '3int32', 'int32']})
print(c)
print(c[2]['age'])

# 第四種設置dtype的方式
# 0, 16, 28表示數據存儲時的字節偏移位置
# 在0字節位置輸出name, 16字節位置輸出scores..
d = np.array(data, dtype={
		'name': ('U3', 0),
		'scores': ('3int32', 16),
		'age': ('int32', 28)})
print(d)
print(d[2]['age'])

# ndarray數組中存放日期類型數據
f = np.array(['2011', '2012-01-01', 
	'2013-11-11 11:11:11', '2013-01-01'])
print(f)
# datetime64[D]: 描述時間(精確到day)
g = f.astype('M8[D]')
print(g, g.dtype)
print(g[3] - g[1])
print(g.astype('int32'))

print(np.array([0]).astype('M8[s]'))

類型字符碼

類型字符碼
bool_ ?
int8 / int16 / int32 / int64 i1 / i2 / i4 / i8
uint8 / uint16 / uint32 / uint64 u1 / u2 / u4 / u8
float16 / float32 / float64 f2 / f4 / f8
complex64 / complex128 c8 / c16
str_ U<字符數> 一個字符佔4字節
datetime64 M8[Y | M | D | h | m | s]

ndarray數組對象的維度操作

視圖變維(數據共享) ary.reshape() ary.ravel()

import numpy as np

a = np.arange(1, 9)
print(a)
# 視圖變維
b = a.reshape(2, 4)
print(b)
b[0, 0] = 999
print(b)
c = a.ravel()
print(c)
c[1] = 888
print(c)
print(a)

複製變維 a.flatten() a.copy()

print('-' * 45)
d = b.flatten()
print(b)
print(d)
d[2] = 777
print(b)
print(d)

就地變維直接改變原數組的維度 a.shape a.resize()

b.shape = (4, 2)
print(b)
b.resize(2, 2, 2)
print(b)

ndarray對象的切片操作

# 數組對象的切片與列表切片參數含義相似
# 步長+: 從前向後切
# 步長-: 從後向前切
ary[起始位置:終止位置:步長]

import numpy as np
a = np.arange(1, 10)
print(a)  # 1 2 3 4 5 6 7 8 9
print(a[:3])
print(a[3:6])
print(a[6:])
print(a[::-1])
print(a[:-4:-1])
print(a[-4:-7:-1])
print(a[-7::-1])
print(a[:])
print(a[::3])
print(a[1::3])

多維數組的切片操作:

# 以,作爲分隔符, 分別對 頁/行/列 每一維度執行切片
ary[::, ::, ....]

ndarray數組的掩碼操作

"""
demo07_mask.py  ndarray的掩碼操作
"""
import numpy as np
a = np.arange(1, 10)
mask = (a%2==0)
print(a)
print(mask)
print(a[mask])
# 使用掩碼對數組排序
mask = [8, 1, 2, 7, 3, 4, 6, 5, 0]
print(a[mask])

# 輸出100以內3與7的倍數
b = np.arange(100)
print(b[(b%3==0) & (b%7==0)])

多維數組的組合與拆分

垂直方向的操作:

# 垂直方向執行組合操作
c = np.vstack((a, b))
print(c, c.shape)
# 垂直方向執行拆分操作
a, b, c, d = np.vsplit(c, 4)
print(a, b, sep='\n')

水平方向的操作:

# 水平方向的組合與拆分
d = np.hstack((a, b))
print(d, d.shape)
a, b = np.hsplit(d, 2)
print(a, b, sep='\n')

深度方向的操作:

# 深度方向的組合與拆分
e = np.dstack((a, b))
print(e, e.shape)
a, b = np.dsplit(e, 2)
print(a, b, sep='\n')

組合與拆分的相關函數:

# 以axis作爲軸向,把a與b進行組合操作
# 若a與b都是二維數組:
#  0: 垂直方向組合
#  1: 水平方向組合
# 若a與b都是三維數組:
#  0: 垂直方向組合
#  1: 水平方向組合
#  2: 深度方向組合
c = np.concatenate((a, b), axis=0)
# 以axis作爲軸向,把c拆成兩部分 a與b
a, b = np.split(c, 2, axis=0)

簡單一維數組的組合方案:

# 簡單一維數組的組合方案
a = a.ravel()
b = b.ravel()
# 把a與b合併成2行
c = np.row_stack((a, b))
# 把a與b合併成2列
d = np.column_stack((a, b))
print(c)
print(d)

ndarray的其他常用屬性

ndim 維數
itemsize 元素字節數
nbytes 數組的總字節數
real 返回複數數組所有元素的實部
imag 返回複數數組所有元素的虛部
T 返回數組的轉置視圖
flat 多維數組的扁平迭代器

“”"
demo09_attrs.py 測試常用屬性
“”"
import numpy as np

data = np.array([[1+1j, 2+4j, 3+7j],
[4+2j, 5+5j, 6+8j],
[7+3j, 8+6j, 9+9j]])
print(data.dtype)
print(data.ndim)
print(data.itemsize)
print(data.nbytes)
print(data.real)
print(data.imag)
print(data.T)

for item in data.flat:
print(item, end=’ ')

數據分析之-numpy概述

numpy概述

numpy的歷史

numpy基礎

One-Hot 編碼

Spark常用算子概述

AttributeError: 'DataFrame' object has no attribute 'map'

spark-env.sh配置參數詳解

spark重分區算子repartition和coalesce解析

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結