怎樣用PyQt5.6 爬取網頁

PyQt 5.6 以後選用chromium 是新一代QT用的瀏覽器引擎。。。與之前的Webkit有很大的區別。經過長時間的測試,終於可以用了!


# -*- coding: utf-8 -*-


import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineWidgets import QWebEnginePage, QWebEngineView

class Render(QWebEngineView):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.load(QUrl(url))
self.app.exec_()

def _loadFinished(self, result):
# This is an async call, you need to wait for this
# to be called before closing the app
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
# Data has been stored, it's safe to quit the app
self.app.quit()



import lxml.html

#定義一個網頁地址
url = 'https://xxxxxxxxxxxx'

r = Render(url)
result = r.html
tree = lxml.html.fromstring(result)

參考下面的文章:


https://stackoverflow.com/questions/37754138/how-to-render-html-with-pyqt5s-qwebengineview



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章