數據採集（一）：requests爬取圖片(3種方式)

原創

2020-02-21 21:26

舉例爬取百度貼吧上一張網頁上的圖片，附上相關html源碼，網址失效也無關係，重在分析學習。

<div id="post_content_87286618651" class="d_post_content j_d_post_content  clearfix">
            <img class="BDE_Image" src="http://imgsrc.baidu.com/forum/w%3D580/sign=b2310eb7be389b5038ffe05ab534e5f1/680c676d55fbb2fbc7f64cbb484a20a44423dc98.jpg" size="21406" changedsize="true" width="560" height="747" style="cursor: url(&quot;http://tb2.bdstatic.com/tb/static-pb/img/cur_zin.cur&quot;), pointer;">
</div>

首先…

打開網頁

# -*- coding: utf-8 -*-
import requests

url = 'http://tieba.baidu.com/p/4468445702'
html = requests.get(url)
#指定編碼
html.encoding='utf-8'

然後…

獲取url (3種方式)

使用 BeautifulSoup 庫

from bs4 import BeautifulSoup

bs = BeautifulSoup(html.content,'html.parser')
img_list = bs.find('div',{'id':'post_content_87286618651'}).findAll('img')
img_src = img_list[0].attrs['src']
print(img_src)

使用xpath

from lxml import etree

selector = etree.HTML(html.content)
images = selector.xpath('//*[@id="post_content_87286618651"]/img')
print image.attrib.get('src')

使用正則表達式

import re

text = html.content
pattern = re.compile(r'<img .*src="(.*?)" size="21406"',re.S)
match = pattern.search(text)
print match.group(1)

最後…

將圖像寫入文件

img = requests.get(img_src)
with open('baidu_tieba.jpg', 'ab') as f:
    f.write(img.content)
    f.close()

czl389

發佈了75 篇原創文章 · 獲贊 127 · 訪問量 40萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

數據採集（一）：requests爬取圖片(3種方式)

打開網頁

獲取url (3種方式)

使用 BeautifulSoup 庫

使用xpath

使用正則表達式

將圖像寫入文件

圖像拼接（七）：OpenCV單應變換模型拼接多幅圖像

圖像拼接（十三）：OpenCV拼接多幅圖像(以中間圖像爲參考)

圖像拼接（六）：OpenCV單應變換模型拼接兩幅圖像

圖像拼接（十二）：OpenCV SeamFinder+GraphCut+最佳拼接縫尋找

圖像拼接（九）：雙攝像頭實時視頻拼接（單應變換模型）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結