1 403錯誤
scrapy默認是遵守爬蟲準則的,即settings.py裏面,ROBOTSTXT_OBEY = True,改爲False
2 防爬機制,需要僞裝成遊覽器
找到scrapy庫的安裝目錄,如D:\Python\Lib\site-packages\scrapy\settings
找到裏面的default_settings.py
找到USER_AGENT:
改成:USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'
3 如何檢查自己的XPATH路徑是否正確
調試方法,scrapy shell 測試的url===》scrapy shell http://www.baidu.com
4 ValueError: unsupported format character 'C'
https://blog.csdn.net/xlsj228/article/details/106379997