Python 使用requests時的編碼問題

原創

2020-07-04 08:28

官網說明：

Compliance

Requests is intended to be compliant with all relevant specifications and RFCs where that compliance will not cause difficulties for users. This attention to the specification can lead to some behaviour that may seem unusual to those not familiar with the relevant specification.

Encodings

When you receive a response, Requests makes a guess at the encoding to use for decoding the response when you access the Response.text attribute. Requests will first check for an encoding in the HTTP header, and if none is present, will use chardet to attempt to guess the encoding.

The only time Requests will not do this is if no explicit charset is present in the HTTP headersand the Content-Type header contains text. In this situation, RFC 2616 specifies that the default charset must be ISO-8859-1. Requests follows the specification in this case. If you require a different encoding, you can manually set the Response.encoding property, or use the rawResponse.content.

意思就是:

當你收到一個響應時，Requests會猜測響應的編碼方式，用於在你調用 Response.text 方法時對響應進行解碼。Requests首先在HTTP頭部檢測是否存在指定的編碼方式，如果不存在，則會使用 charade 來嘗試猜測編碼方式。

只有當HTTP頭部不存在明確指定的字符集，並且 Content-Type 頭部字段包含 text 值之時， Requests纔不去猜測編碼方式。

在這種情況下， RFC 2616 指定默認字符集必須是 ISO-8859-1 。Requests遵從這一規範。如果你需要一種不同的編碼方式，你可以手動設置 Response.encoding 屬性，或使用原始的 Response.content 。

測試

經過測試發現也有不準確的時候，下面看例子。

下面是獲得的response內容：

很明顯header部分有指定charset="gbk",按文檔中的說明應該不會使用默認的編碼ISO-8859-1進行解碼，但結果卻不是這樣。

 r = requests.get(url)
 print r.encoding
#結果：ISO-8859-1

結果出現亂碼，解決辦法就是手動指定編碼方式，調用requests.text時它就會按照指定的編碼方式去解碼。

r = requests.get(url)

r.encoding='gbk'
print r.headers['content-type']
data = r.text
print data

#打印結果無亂碼

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python 使用requests時的編碼問題

官網說明：

Compliance

Encodings

意思就是:

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

Spring 事物隔離級別的配置（5種方法）

HTML轉PDF方案

裝飾器與函數式Python（譯）

Python 使用requests時的編碼問題

Java集合類-ConcurrentHashMap原理分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結