python整數、字符串、字節串相互轉換

數據解析時，python可以相互轉換各種數據類型。最近在斯坦福公開課《密碼學》網站上面做題發現，我對數據轉換很不熟悉，寫下日誌記下用法。

導航

	數字	字符串	字節碼
到數字	進制轉換	字符轉整數	字節串轉整數
到字符串	str()	字符串編碼解碼	decode(‘hex’)
到字節碼	數字轉字符串	字符串轉字節串	no

還有常見的單個字符轉換

函數	功能	記憶口訣	備註
chr	數字轉成對應的ascii字符	chr長得很像char，因此轉成char	範圍爲0~255
ord	單個字符轉對應ascii序號	digit爲最後一個字母

進制轉換

10進制轉16進制:

hex(16)  ==>  0x10

16進制轉10進制:

int(STRING,BASE)將字符串STRING轉成十進制int，其中STRING的基是base。該函數的第一個參數是字符串

int('0x10', 16)  ==>  16

類似的還有八進制oct()，二進制bin()

16進制字符串轉成二進制

hex_str='00fe'
bin(int('1'+hex_str, 16))[3:]  #含有前導0
# 結果 '0000000011111110'
bin(int(hex_str, 16))[2:]   #忽略前導0
# 結果 '11111110'

二進制字符串轉成16進制字符串

bin_str='0b0111000011001100'
hex(int(bin_str,2))
# 結果 '0x70cc'

字符to整數

10進制字符串:

int('10')  ==>  10

16進制字符串:

int('10', 16)  ==>  16
# 或者
int('0x10', 16)  ==>  16

字節串to整數

使用網絡數據包常用的struct，兼容C語言的數據結構
struct中支持的格式如下表

Format	C-Type	Python-Type	字節數	備註
x	pad byte	no value	1
c	char	string of length 1	1
b	signed char	integer	1
B	unsigned char	integer	1
?	_Bool	bool	1
h	short	integer	2
H	unsigned short	integer	2
i	int	integer	4
I	unsigned int	integer or long	4
l	long	integer	4
L	unsigned long	long	4
q	long long	long	8	僅支持64bit機器
Q	unsigned long long	long	8	僅支持64bit機器
f	float	float	4
d	double	float	8
s	char[]	string	1
p	char[]	string	1(與機器有關)	作爲指針
P	void *	long	4	作爲指針

對齊方式：放在第一個fmt位置

CHARACTER	BYTE ORDER	SIZE	ALIGNMENT
@	native	native	native
=	native	standard	none
<	little-endian	standard	none
>	big-endian	standard	none
!	network (= big-endian)	standard	none

轉義爲short型整數:

struct.unpack('<hh', bytes(b'\x01\x00\x00\x00'))  ==>  (1, 0)

轉義爲long型整數:

struct.unpack('<L', bytes(b'\x01\x00\x00\x00'))  ==>  (1,)

整數to字節串

轉爲兩個字節:

struct.pack('<HH', 1,2)  ==>  b'\x01\x00\x02\x00'

轉爲四個字節:

struct.pack('<LL', 1,2)  ==>  b'\x01\x00\x00\x00\x02\x00\x00\x00'

整數to字符串

直接用函數

str(100)

字符串to字節串

我用c++實現的encode(hex)和decode(hex)
decode和encode區別

decode函數是重新解碼，把CT字符串所顯示的69dda8455c7dd425【每隔兩個字符】解碼成十六進制字符\x69\xdd\xa8\x45\x5c\x7d\xd4\x25

CT='69dda8455c7dd425'
print "%r"%CT.decode('hex')

encode函數是重新編碼，把CT字符串所顯示的69dda8455c7dd425【每個字符】編碼成acsii值，ascii值爲十六進制顯示，佔兩位。執行下列結果顯示36396464613834353563376464343235等價於將CT第一個字符’6’編碼爲0x36h 第二個字符’9’編碼爲0x39h

CT='69dda8455c7dd425'
print "%r"%CT.encode('hex')

可以理解爲：decode解碼，字符串變短一半，encode編碼，字符串變爲兩倍長度

decode(‘ascii’)解碼爲字符串Unicode格式。輸出帶有’u’
encode(‘ascii’)，編碼爲Unicode格式，其實python默認處理字符串存儲就是Unicode，輸出結果估計和原來的字符串一樣。

字符串編碼爲字節碼:

'12abc'.encode('ascii')  ==>  b'12abc'

數字或字符數組:

bytes([1,2, ord('1'),ord('2')])  ==>  b'\x01\x0212'

16進制字符串:

bytes().fromhex('010210')  ==>  b'\x01\x02\x10'

16進制字符串:

bytes(map(ord, '\x01\x02\x31\x32'))  ==>  b'\x01\x0212'

16進制數組:

bytes([0x01,0x02,0x31,0x32])  ==>  b'\x01\x0212'

字節串to字符串

字節碼解碼爲字符串:

bytes(b'\x31\x32\x61\x62').decode('ascii')  ==>  12ab

字節串轉16進製表示,夾帶ascii:

str(bytes(b'\x01\x0212'))[2:-1]  ==>  \x01\x0212

字節串轉16進製表示,固定兩個字符表示:

str(binascii.b2a_hex(b'\x01\x0212'))[2:-1]  ==>  01023132

字節串轉16進制數組:

[hex(x) for x in bytes(b'\x01\x0212')]  ==>  ['0x1', '0x2', '0x31', '0x32']

問題：什麼時候字符串前面加上’r’、’b’、’r’，其實官方文檔有寫。我認爲在Python2中，r和b是等效的。

The Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix.
‘b’字符加在字符串前面，對於python2會被忽略。加上’b’目的僅僅爲了兼容python3，讓python3以bytes數據類型(0~255)存放這個字符、字符串。

The Python 3.3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
數據類型byte總是以’b’爲前綴，該數據類型僅爲ascii。

下面是stackflow上面一個回答。我覺得不錯，拿出來跟大家分享

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

unicode = u’…’ literals = sequence of Unicode characters = 3.x str
str = ‘…’ literals = sequences of confounded bytes/characters
Usually text, encoded in some unspecified encoding.
But also used to represent binary data like struct.pack output.

Python 3.x makes a clear distinction between the types:

str = ‘…’ literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled)
bytes = b’…’ literals = a sequence of octets (integers between 0 and 255)

CPP實現encode

就是做個筆記，畢竟在做題Cryptography時候用c++寫字符串的處理很蛋疼！爲了防止再次造輪子，記下來。

#include <cstring> //用到strlen函數
static unsigned char ByteMap[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8','9', 'a', 'b', 'c', 'd', 'e', 'f' };

unsigned char hex_2_dec(unsigned char c){
    if(c >= '0' && c <= '9') return c - '0';
    if(c >= 'a' && c <= 'f') return c - 'a' + 10;
}

void str_encode(unsigned char *src, unsigned char *dest, int len_of_src) {
    // 使用注意：dest_len >= 2*len_src +1，最後一位是存放'\0'。
    int t1;
    for (int i = 0; i < len_of_src; ++i) {
        t1 = (int) src[i];
        dest[2 * i] = ByteMap[t1 / 16];
        dest[2 * i + 1] = ByteMap[t1 % 16];
    }
    dest[2 * len_of_src] = 0; //必須填充最後一個爲'\0'
}

void str_decode(unsigned char *src,unsigned char *dest){
    int len_of_src=strlen((char *)src);
    unsigned char t1;
    for(int i=1;i<=len_of_src;i+=2){
        t1=hex_2_dec(src[i-1]);
        t1= 16*t1 + hex_2_dec(src[i]);
        dest[i/2]=t1;
    }
}

python整數、字符串、字節串相互轉換

進制轉換

字符to整數

字節串to整數

整數to字節串

整數to字符串

字符串to字節串

字節串to字符串

CPP實現encode

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

kerberods挖礦病毒查殺及分析(crontab 挖礦 curl -fsSL https://p

nginx配置負載均衡

Redis命令使用方法

ELK日誌查詢

Beautiful Soup 解析數據用法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結