使用selenium進行網頁爬取

原創

slibra_L

2020-06-29 01:13

有些網站的反爬機制極強，需要更真實的去模擬人訪問網站的動作纔可以爬取信息，這時就需要selenium

一、selenium是什麼

selenium是什麼呢？它是一個強大的Python庫。

它可以做什麼呢？它可以用幾行代碼，控制瀏覽器，做出自動打開、輸入、點擊等操作，就像是有一個真正的用戶在操作一樣。

二、驅動下載

首先需要安裝瀏覽器驅動，下載驅動後將exe文件複製到python根目錄下（虛擬環境根目錄也可以）

ChromeDriver與Chrome版本對應參照表及ChromeDriver下載鏈接

使用pip安裝selenium

三、爬取信息

# 本地Chrome瀏覽器設置方法
from selenium import webdriver # 從selenium庫中調用webdriver模塊
import time # 調用time模塊
driver = webdriver.Chrome() # 設置引擎爲Chrome，真實地打開一個Chrome瀏覽器

driver.get('https://localprod.pandateacher.com/python-manuscript/hello-spiderman/') # 訪問頁面
time.sleep(2) # 暫停兩秒，等待瀏覽器緩衝

teacher = driver.find_element_by_id('teacher') # 找到【請輸入你喜歡的老師】下面的輸入框位置
teacher.send_keys('必須是吳楓呀') # 輸入文字
assistant = driver.find_element_by_name('assistant') # 找到【請輸入你喜歡的助教】下面的輸入框位置
assistant.send_keys('都喜歡') # 輸入文字
button = driver.find_element_by_class_name('sub') # 找到【提交】按鈕
button.click() # 點擊【提交】按鈕
time.sleep(1)
driver.close() # 關閉瀏覽器

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用selenium進行網頁爬取

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

crontab調度git報錯fatal: could not read Username for url: Device not configured

python爬蟲-動態網站爬取

使用selenium進行網頁爬取

windows10安裝docker的若干坑--無法找到Hyper-V

windows10安裝docker的若干坑--Docker Desktop requires Windows 10 Pro/Enterprise (15063+)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結