一個簡單的puppeteer例子

原創

2019-09-17 18:35

作用是抓取掌閱書城裏男頻女頻各分類的已完結書籍信息。
前面的pids和cids這兩個常數數組都是事先在頁面上查看超鏈接收集的。
這個頁面沒有任何反爬措施，適合作爲簡單例子。

const fs = require("fs")
const puppeteer = require('puppeteer');

const url = "http://www.ireader.com/index.php?ca=booksort.index&pca=booksort.index&pid=$pid&order=score&status=3&cid=$cid&page=$page"
const pids = [10, 68]; // 男頻，女頻
const cids = [[11, 27, 19, 22, 16, 39, 42, 50, 54, 57, 60], [69, 74, 82, 86, 89, 90, 91, 723]]; // 頻道中的分類ID

(async () => {
    const browser = await puppeteer.launch({
        // headless: false, 
        ignoreDefaultArgs: ["--enable-automation"], 
    });
    const page = await browser.newPage();
    const f = () => {
        return Array.from($('.bookMation')).map(e => {
            const id = $('h3 a', e).attr('href').match(/bid=(\d+)/)[1]
            const title = $('h3 a', e).text()
            const author = $('p.tryread', e).text().replace('試讀', '').trim()
            const desc = $('p.introduce', e).text()
            return {id, title, author, desc}
        })
    }
    let result = [];
    for (const i in pids) {
        const pid = pids[i]
        for (cid of cids[i]) {
            for (let pg = 1; pg < 4; pg++) { // 只抓前三頁
                const u = url.replace("$cid", cid).replace("$pid", pid).replace("$page", pg)
                await page.goto(u);
                const res = await page.evaluate("(" + f + ")()")
                res.forEach(e => { e.cid = cid; e.pid = pid })
                result = result.concat(res)
                console.log("page " + pg + " done")
            }
            console.log("cid " + cid + " done")
        }
        console.log("pid " + pid + " done")
    }
    fs.writeFileSync("d:/tmp/ireader_hot.json", JSON.stringify(result), {encoding: "utf-8"})
    console.log("all done")
      await browser.close();
})();

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

一個簡單的puppeteer例子

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

程序員常見的文本查看工具

ITSM落地經驗之建設藍圖規劃

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

GDC2013見聞：手遊當道，免費模式爲王

淺談遊戲的微博營銷

安卓設備屏幕尺寸參數

Haxe中保存位圖爲JPG格式

JS遊戲引擎大全

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結