爬蟲Scrapy框架的安裝配置
突然想萌生了學習爬蟲的想法,於是我就去安裝scrapy這個框架,但是————,但是配置的過程真是太糟心了,scrapy依賴的組件相當多,而且安裝的組件對Python的版本也有要求。。。
費了很大力氣,找了網上各種大神的資料,逛了好幾遍Stack Overflow,總是安裝成功了 = _ =。在這裏寫一下安裝的過程和心得<–
1.下載python2並安裝
這個就不用多說了,去Python 官網 https://www.python.org/downloads/ 下一個安裝包就行,但是要記住安裝包的版本,我以前安裝的是3.5版本的。
然後配置python環境變量(寫這段可能是廢話,都應該知道環境變量的配置吧。。):
電腦->屬性->高級->環境變量->在系統變量中的Path和用戶變量中的path的結尾加上python路徑名(我的安裝目錄爲D:\Python)
打開windows命令行cmd,並輸入Python,顯示:
說明python安裝和環境變量配置成功。
2.安裝Scrapy
首先更新pip版本
打開cmd命令行,輸入:pip install –upgrade pip
嘗試安裝scrapy
輸入:pip install Scrapy
這會報下面的錯誤:
這個錯誤原因是python安裝第三方庫超時了。輸入:pip –default-timeout=100 install -U Pillow來安裝pillow、設置超時時間。
再次輸入:pip install Scrapy
這裏提示twisted的版本與python的不對應。。再次設置超時時間
pip –default-timeout=100
設置成功。安裝Scrapy
輸入:pip install Scrapy
這個時候又報錯了。。。(/(ㄒoㄒ)/~~)
這個錯誤原因是 缺少 Microsoft Visual C++ 14.0 .
解決方法:在 http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted 手動下載對應的版本(cp後面跟的是對應python的版本號,我的是3.5的,對應就是cp35。amd64表示是64位)。
將下載的文件放到一個已知目錄下,然後在cmd中轉到這個目錄下,輸入:pip install Twisted-17.5.0-cp35-cp35m-win_amd64.whl(安裝剛下載的文件)
最後執行:pip install Scrapy
顯示successfully installed Scrapy-1.4.0總算是安裝成功了。。。
測試scrapy
下載完成了,我們測試一下scrapy,控制檯輸入:scrapy bench
如果出現下面錯誤:
Traceback (most recent call last):
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\defer.py", line 1260, in _inlineCallbacks
result = g.send(result)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\core\engine.py", line 68, in __init__
self.downloader = downloader_cls(crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
mod = import_module(module)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line 23, in <module>
from scrapy.xlib.tx import ResponseFailed
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\xlib\tx\__init__.py", line 3, in <module>
from twisted.web import client
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\web\client.py", line 42, in <module>
from twisted.internet.endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\endpoints.py", line 36, in <module>
from twisted.internet.stdio import StandardIO, PipeAddress
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
from twisted.internet import _win32stdio
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\_win32stdio.py", line 9, in <module>
import win32api
ImportError: DLL load failed: 找不到指定的模塊。
這說明系統中缺少pywin32程序,我們去下載一下。
https://sourceforge.net/projects/pywin32/files/
http://www.lfd.uci.edu/~gohlke/pythonlibs/
注意下載的pywin32版本一定要與Python一致
如果上面的網站下載速度過慢導致下載失敗的話,去百度一個也行。
安裝完成後,控制檯輸入:scrapy bench
測試結果顯示你的電腦平均每分鐘能爬多少頁。
最後 Python ExtensionPackage 網站是Windows平臺下Python的擴展包下載地址,裏面包含了Python開發可能用到的所有擴展包,建議收藏。