建標庫(JianBiaoKu.com)圖片數據轉PDF爬蟲案例

原創

2020-06-13 21:00

1、獲取單個項目文件所有頁碼

def get_all_url(self):
    """獲取所有請求URL"""
    res = self.s.get(self.startUrl, headers=headers)
    res.encoding = 'utf-8'
    selector = etree.HTML(res.text)
    urls = selector.xpath('//ul[@class="book_catalog"]//@href')
    params = '/webarbs/book/{}/'.format(self.projectId)
    urls = [self.domain + params + x for x in urls]
    title = selector.xpath('//div[@class="location"]/span/a[last()]/text()')[0]
    title = re.sub(r'\||\<|\>|\\|\/|\:|\*|\"|\?','-',str(title))
    return urls, title

2、異步請求獲取所有圖片URL

async def getImageUrls(self, url):
	"""獲取圖片URL"""
    conn = TCPConnector(limit=10)
    async with ClientSession(connector=conn) as session:
        async with session.get(url, headers=self.headers) as response:
            html = await response.text()
            img = re.findall('http://www.zzguifan.com.+?\.jpg', html)[0]
            return img
            
def download(self, page, task):
    """下載圖片"""
    res = self.s.get(task.result(), headers=self.headers)
    imgName = page.split('/')[-1][:-6]
    savePath = sys.path[0] + '\\' + self.projectId + '\\'
    self.mkdir(savePath)
    fullPath = savePath + imgName + '.jpg'
    self.imageIds[int(imgName)] = fullPath  # 保存路徑信息方便後面生成PDF
    with open(fullPath, 'wb') as f:
        f.write(res.content)

async def taskManger(self):
    """異步任務管理"""
    tasks = []
    for url in self.urls:
        task = asyncio.create_task(self.getImageUrls(url))
        task.add_done_callback(partial(self.download, url))  # 設置回調函數參數
        tasks.append(task)
    await asyncio.gather(*tasks)

3、生成PDF文件

def makePdf(pdfFileName, listPages):
    """圖片轉PDF"""
    cover = Image.open(listPages[0])
    width, height = cover.size

    pdf = FPDF(unit="pt", format=[width, height])

    for page in listPages:
        pdf.add_page()
        pdf.image(page, 0, 0)

    pdf.output(pdfFileName, "F")

4、輸出結果預覽

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

建標庫(JianBiaoKu.com)圖片數據轉PDF爬蟲案例

1、獲取單個項目文件所有頁碼

2、異步請求獲取所有圖片URL

3、生成PDF文件

4、輸出結果預覽

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

Centos 7下安裝MariaDB（MySQL）教程

Python3 多線程(連接池)操作MySQL插入數據

Python 多圖片合併生成PDF

pymysql 增刪改查二次封裝

asyncio + aiohttp協程異步併發示例

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結