問題描述
最近閱讀了一本爬蟲方面的書1,按照書上161頁的代碼原封不動的敲到電腦中,編寫一個爬蟲蜘蛛,但運行以後出現以下錯誤:
Error closing cursor
Traceback (most recent call last):
File “E:\StudyCard\BigData\WebScrape\PWSfDScode.pwsenv\lib\site-packages\sqlalchemy\engine\result.py”, line 1324, in fetchone
row = self._fetchone_impl()
File “E:\StudyCard\BigData\WebScrape\PWSfDScode.pwsenv\lib\site-packages\sqlalchemy\engine\result.py”, line 1204, in _fetchone_impl
return self.cursor.fetchone()
sqlite3.ProgrammingError: Cannot operate on a closed database.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “E:\StudyCard\BigData\WebScrape\PWSfDScode.pwsenv\lib\site-packages\sqlalchemy\engine\base.py”, line 1339, in _safe_close_cursor
cursor.close()
sqlite3.ProgrammingError: Cannot operate on a closed database.
Traceback (most recent call last):
File “E:\StudyCard\BigData\WebScrape\PWSfDScode.pwsenv\lib\site-packages\sqlalchemy\engine\result.py”, line 1324, in fetchone
row = self._fetchone_impl()
File “E:\StudyCard\BigData\WebScrape\PWSfDScode.pwsenv\lib\site-packages\sqlalchemy\engine\result.py”, line 1204, in _fetchone_impl
return self.cursor.fetchone()
sqlite3.ProgrammingError: Cannot operate on a closed database.
我記得之前看這本書的時候就遇到過該錯誤。該錯誤產生的原因應該是records
庫的原因。
解決方案
需要在建立數據庫的代碼後添加代碼:
import requests
import records
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from sqlalchemy.exc import IntegrityError
db = records.Database('sqlite:///crawler_database.db')
db = db.get_connection() # 新加
代碼即可正常運行。
Seppe v. Broucke, Bart Baesens. Practical Web Scraping for Data Science. Apress, 2018. ↩︎