scrapy中使用訊代理動態轉發

scrapy源代碼中查找http11.py文件,相對路徑爲:
Lib/site-packages/scrapy/core/downloader/handlers/http11.py

找到下面內容,註釋掉:
if isinstance(agent, self._TunnelingAgent):
   headers.removeHeader(b'Proxy-Authorization')

否則proxy-authorization會被去除,動態轉發失效。

自定義下載中間件:
class ProxyIPMiddleware(object):
    '''
    隨機更換代理ip
    '''
    def __init__(self):
        self.orderno = "xxxxxxxxxxxx"  # 訂單號
        self.secret = "xxxxxxxxxxx"  # 祕鑰

    def process_request(self,request,spider):
        print('====ProxyIPMiddleware====')
        protocal = request.url.split(':')[0].strip().lower()
        print(request.url,'protocal:',protocal)
        ip = "forward.xdaili.cn"  # 代理ip
        port = "80"  # 端口號
        ip_port = ip + ":" + port
        proxy = {"http": "http://" + ip_port, "https": "https://" + ip_port}
        timestamp = str(int(time.time()))  # 時間戳
        string = "orderno=" + self.orderno + "," + "secret=" + self.secret + "," + "timestamp=" + timestamp
        md5_string = hashlib.md5(string.encode()).hexdigest()  # md5哈希,得到固定長度的字符串
        sign = md5_string.upper()  # 轉換成大寫字母
        # 認證信息
        auth = "sign=" + sign + "&" + "orderno=" + self.orderno + "&" + "timestamp=" + timestamp
        print('auth:',auth)
        request.headers['Proxy-Authorization'] = auth
        #HTTP代理,只代理HTTP網站,HTTPS代理,只代理HTTPS網站
        request.meta['proxy'] = proxy[protocal]

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章