示例1：爬python官網的“python之禪”

import requests
url = 'https://www.python.org/dev/peps/pep-0020/'
res = requests.get(url)
text = res.text
text

使用requests庫，返回url頁面內容。

可以看到返回的其實就是開發者工具下Elements的內容，只不過是字符串類型，接下來我們要用python的內置函數find來定位“python之禪”的索引，然後從這段字符串中取出它。

with open('zon_of_python.txt','w')as f:
    f.write(text[text.find('<pre')+28:text.find('</pre>')-1])
print(text[text.find('<pre')+28:text.find('</pre')-1])

示例2：爬取豆瓣電影

import requests
import os

if not os.path.exists('image'):
    os.mkdir('image')

def parse_html(url):
    headers={"User-Agent":"Mozilla/5.0(Windows NT 10.0;Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/74.0.3729.169 Safari/537.36"}
    res=requests.get(url,headers=headers)
    text=res.text
    item=[]
    for i in range(25)
        text=text[text.find('alt')+3:]
        item.append(extract(text))
    return item

def extract(text):
    text=text.split('"')
    name=text[1]
    image=text[3]
    return name,image
def write_movies_file(item, stars):
    print(item)
    with open('douban_film.txt','a',encoding='utf-8') as f:
        f.write('排名：%d\t電影名：%s\n' % (stars, item[0]))
    r = requests.get(item[1])
    with open('image/' + str(item[0]) + '.jpg', 'wb') as f:
        f.write(r.content)
        
def main():
    stars = 1
    for offset in range(0, 250, 25):
        url = 'https://movie.douban.com/top250?start=' + str(offset) +'&filter='
        for item in parse_html(url):
            write_movies_file(item, stars)
            stars += 1

if __name__ == '__main__':
    main()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python爬蟲編程實踐——從0.1到0.5學習總結

示例1：爬python官網的“python之禪”

示例2：爬取豆瓣電影

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

pandas下——進階學習

概率統計學習

task1-自學AI

數據競賽房租預測——賽題分析

數據競賽房租預測——整理

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結