Python解決下載pdf問題bug

原創

2020-02-23 17:24

代碼如下：

1、BUG問題：requests返回的二進制結果resp.content=b’ '爲空，無法下載pdf
2、產生原因： response.close()方法會調用HttpWorkerRequest.CloseConnection()方法。終止(Terminate)與客戶端的套接字連接，並使得服務器，客戶端以及之間設施上的緩存（buffer）失效。導致發送到客戶端的數據丟失。如果還未存數據就關閉連接，易造成數據丟失。
3、解決方法：response.close()關閉提後或者不使用

def get_proxies():
    proxy = {}
    return proxy

def pdf_url_requests(pdf_url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh-CN,zh;q=0.9",
    }
    for i in range(4):
        try:
            resp = requests.get(pdf_url, headers=headers, proxies=proxy, stream=True, timeout=20)
            resp.encoding = "utf-8"
        except socket.error as err:
            logging.warning(f"{pdf_url} : this website may socket timeout, sleep 1s and try again : {err} ")
            time.sleep(random.uniform(0.5, 1.5))
            proxy = get_proxies()
        except Exception as _e:
            time.sleep(random.uniform(0.5, 1.5))
            logging.exception(f'{pdf_url} ：Exception error happened in methods_get_requests :{_e}')
            proxy = get_proxies()
        else:
            if not resp:
                continue
            return resp

def download_pdf(pdf_url, pdf_name):
    r = pdf_url_requests(pdf_url)
    pdf_path = f"{E:/pdf/AuditReport/{pdf_name}"
    with open(pdf_path, 'wb') as f:
        for content in r.iter_content(chunk_size=512):
            if content:
                f.write(content)
    r.close()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python解決下載pdf問題bug

代碼如下：

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

Python爬蟲bug_unable to decode value

Python爬蟲requests後的html亂碼解決(gzip, deflate, br)

Python_datetime模塊使用

python相關渲染庫Selenium、Puppeteer、Splash安裝

Python_深度學習環境配置

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結