通過關鍵詞在百度爬取圖片

原創

2020-06-16 02:08

# -*- coding: utf-8 -*-

import requests
import time
import os
import sys
import importlib
import json
importlib.reload(sys)


def getManyPages(keyword,pages):
    params=[]
    for i in range(30,30*pages+30,30):
        params.append({
                      'tn': 'resultjson_com',
                      'ipn': 'rj',
                      'ct': 201326592,
                      'is': '',
                      'fp': 'result',
                      'queryWord': keyword,
                      'cl': 2,
                      'lm': -1,
                      'ie': 'utf-8',
                      'oe': 'utf-8',
                      'adpicid': '',
                      'st': -1,
                      'z': '',
                      'ic': 0,
                      'word': keyword,
                      's': '',
                      'se': '',
                      'tab': '',
                      'width': '',
                      'height': '',
                      'face': 0,
                      'istype': 2,
                      'qc': '',
                      'nc': 1,
                      'fr': '',
                      'pn': i,
                      'rn': 30,
                      'gsm': '1e',
                      '1488942260214': ''
                  })
    url = 'https://image.baidu.com/search/acjson'
    urls = []
    for i in params:
         try:
            urls.append(requests.get(url, params=i).json().get('data'))
         except json.decoder.JSONDecodeError:
            print("解析出錯")
    return urls


def getImg(dataList, localPath):
    if not os.path.exists(localPath):
        os.mkdir(localPath)

    x = 1
    for list in dataList:
        for i in list:
            if i.get('thumbURL') != None:
                print('正在下載：%s' % i.get('thumbURL'))
                ir = requests.get(i.get('thumbURL'))
                string = localPath + '/' + '%04d.jpg' % x
                with open(string, 'wb') as f:
                    f.write(ir.content)
                x += 1
                time.sleep(0.5)
            else:
                print('圖片鏈接不存在')

if __name__ == '__main__':
    keyword = input("輸入關鍵詞：")
    dataList = getManyPages(keyword, 100)
    getImg(dataList,'E:\\data\\Mydata\\'+keyword)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

通過關鍵詞在百度爬取圖片

python gdal 安裝使用（Windows， python 3.6.8）

電視場景下的圖像技術落地思考

讀取根目錄下的所有照片的絕對路徑

深度學習基礎（一）：windows下視頻數據半自動化標註工具

前端與後端的區別（轉載）

軟件開發小常識（持續更新）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結