Python+Selenium中對指定元素截圖

原創

2019-06-21 00:13

# -*- coding: utf-8 -*-

from PIL import Image
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chromedriver_path = "C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe"  # chromedriver的路徑

print("開始爬取")
# 創建chrome參數對象
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')  # 解決DevToolsActivePort文件不存在的報錯
# options.add_argument('--window-size=1920,1080')  # 指定瀏覽器窗口大小
options.add_argument('--start-maximized')  # 瀏覽器窗口最大化
options.add_argument('--disable-gpu')  # 谷歌文檔提到需要加上這個屬性來規避bug
options.add_argument('--hide-scrollbars')  # 隱藏滾動條, 應對一些特殊頁面
# options.add_argument('--blink-settings=imagesEnabled=false')  # 不加載圖片,加快訪問速度
# options.add_argument('--headless')  # 瀏覽器不提供可視化頁面. linux下如果系統不支持可視化不加這條會啓動失敗
options.add_argument('test-type')
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors",
                                                    "enable-automation"])  # 此步驟很重要，設置爲開發者模式，防止被各大網站識別出來使用了Selenium
# options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})  # 不加載圖片,加快訪問速度

driver = webdriver.Chrome(options=options, executable_path=chromedriver_path)
driver.get('http://www.baidu.com')
print(driver.title)
baidu_img = WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, '.index-logo-src'))
)
driver.save_screenshot("screenshot.png")  # 對整個瀏覽器頁面進行截圖
left = baidu_img.location['x']
top = baidu_img.location['y']
right = baidu_img.location['x'] + baidu_img.size['width']
bottom = baidu_img.location['y'] + baidu_img.size['height']

im = Image.open('screenshot.png')
im = im.crop((left, top, right, bottom))  # 對瀏覽器截圖進行裁剪
im.save('baidu.png')
# driver.quit()
print("爬取完成")

screenshot.png，整個瀏覽器頁面截圖：

baidu.png，百度Logo元素截圖：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python+Selenium中對指定元素截圖

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

Mysql中Join用法及優化

MongoDB中索引的創建和使用詳解

springboot 返回的json中忽略null屬性值，不傳遞

MongoTemplate中$in、$gt、$addToSet、$elemMatch、排序、分頁的使用

gitlab刪除文件/目錄

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結