概述：

站長之家的圖片爬取
使用BeautifulSoup解析html
通過瀏覽器的形式來爬取,爬取成功後以二進制保存，保存的時候根據每一頁按頁存放每一頁的圖片

第一頁：http://sc.chinaz.com/tupian/index.html
第二頁：http://sc.chinaz.com/tupian/index_2.html
第三頁：http://sc.chinaz.com/tupian/index_3.html
以此類推，遍歷20頁

源代碼

# @Author: lomtom
# @Date:   2020/2/27 14:22
# @email: [email protected]

# 站長之家的圖片爬取
# 使用BeautifulSoup解析html
# 通過瀏覽器的形式來爬取,爬取成功後以二進制保存

# 第一頁：http://sc.chinaz.com/tupian/index.html
# 第二頁：http://sc.chinaz.com/tupian/index_2.html
# 第三頁：http://sc.chinaz.com/tupian/index_3.html
# 遍歷14頁

import os
import requests
from bs4 import BeautifulSoup

def getImage():
    url = ""
    for i in range(1,15):
        # 創建文件夾,每一頁放進各自的文件夾
        download = "images/%d/"%i
        if not os.path.exists(download):
            os.mkdir(download)
        # url
        if i ==1:
            url = "http://sc.chinaz.com/tupian/index.html"
        else:
            url = "http://sc.chinaz.com/tupian/index_%d.html"%i
        #發送請求獲取響應，成功狀態碼爲200
        response = requests.get(url)
        if response.status_code == 200:
            # 使用bs解析網頁
            bs = BeautifulSoup(response.content,"html5lib")
            # 定位到圖片的div
            warp = bs.find("div",attrs={"id":"container"})
            # 獲取img
            imglist = warp.find_all_next("img")
            for img in imglist:
                # 獲取圖片名稱和鏈接
                title = img["alt"]
                src = img["src2"]
                # 存入文件
                with open(download+title+".jpg","wb") as file:
                    file.write(requests.get(src).content)
            print("第%d頁打印完成"%i)

if __name__ == '__main__':
    getImage()

效果圖

作者

1、作者個人網站
2、作者CSDN
3、作者博客園
4、作者簡書

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【python爬蟲實戰】批量爬取站長之家的圖片

概述：

源代碼

效果圖

作者

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

【個人博客設計】

【個人博客設計】開發工具篇

【個人博客設計】框架與插件篇

【python爬蟲實戰】爬取豆瓣影評數據

【python爬蟲實戰】使用詞雲分析來分析豆瓣影評數據

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結