python爬蟲之BeautifulSoup4數據提取案例

原創

2020-04-24 21:03

python爬蟲之BeautifulSoup4數據提取案例

本文采用bs4爬取bilibili全站排行榜，並打印成excel表格

f12查看頁面佈局

編碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2020/4/16 0016 20:46
# @Site    : blibili全站榜
# @Author  : Yuk
# @File    : bilibili_bs4.py
import bs4
import requests
import openpyxl

# 搜索條件
recent = 1 # 近期投稿
whole = 0 # 全部投稿
day = 1 # 日排行
three_day = 3 # 三日排行
weekend = 7 # 周排行
month = 30 # 月排行

# 獲取鏈接
def get_url(type='all', tg=0, day=3, base_url='https://www.bilibili.com/ranking/'):
    """
    :param type: 榜單類型：默認all（全站榜）
    :param tg: 投稿：默認0（0全部投稿 1近期投稿）
    :param day: 日期：默認3（三日排行）
    :param base_url: 基礎路徑
    :return: 拼接後url
    """
    return base_url + type + '/0' + '/' + str(tg) + '/' + str(day)

headers = {'user-agent': 'Mozilla/5.0'}
days = weekend
res = requests.get(get_url(day=days), headers=headers)
soup = bs4.BeautifulSoup(res.text, 'lxml')
wb = openpyxl.Workbook()
ws = wb.create_sheet(str(days) + '日排行')
# 設置列寬
ws.column_dimensions['C'].width = 100
ws.column_dimensions['D'].width = 45
ws.column_dimensions['G'].width = 15
ws.column_dimensions['H'].width = 45
# 標題
ws.append(['排行', '圖片', '標題', '鏈接', '播放量', '點擊量', 'up主', 'up主個人空間', '綜合得分'])
for tag in soup.select("li[class='rank-item']"):
    _list = [] # 行數據
    # 排行
    num = tag.find('div', {'class': 'num'}).string
    _list.append(num)
    # 圖片
    img = tag.find('div', {'class': 'img'}).find('img').attrs['src']
    _list.append(img)
    link_info = tag.find('a', {'class': 'title'})
    # 標題
    title = link_info.string
    _list.append(title)
    # 鏈接
    link = link_info.attrs['href']
    _list.append(link)
    # 播放量、點擊量、up主、up主個人空間
    data_box = tag.find_all('span', {'class': 'data-box'})
    play = data_box[0].text
    view = data_box[1].text
    author = data_box[2].text
    author_link = 'https:' + data_box[2].parent.attrs['href']
    _list.append(play)
    _list.append(view)
    _list.append(author)
    _list.append(author_link)
    # 綜合得分
    score = tag.find('div', {'class': 'pts'}).div.string
    _list.append(score)
    ws.append(_list)

# 保存excel
wb.save('d:/bilibili熱門視頻_' + str(days) +'日排行.xls')
wb.close()

生成的excel數據

查看日排行

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python爬蟲之BeautifulSoup4數據提取案例

python爬蟲之BeautifulSoup4數據提取案例

python文件命名的坑導致引入模塊方法失敗

mybatis與spring的整合之MapperFactoryBean

mybatis核心組件之MapperMethod

python操作列表的三個重要內置函數

python對元素進行排序

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結