python項目_Scrapy_爬取圖片???

原創

2020-04-29 17:14

Main Codes

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

count = 0


class BeautySpider(CrawlSpider):
    name = 'beauty'
    # allowed_domains = ['www.meinv.hk/']  # allowed_domain 中加入http 報錯
    # scrapy genspider beauty "http://www.meinv.hk/",然後start_urls 第一個元素有自動添加http://
    start_urls = ['http123://www.meinv1213.hk/']  
    rules = (
        # 因爲在要提取的鏈接後頭加入/,出錯
        Rule(LinkExtractor(
            # 在正則中?:匹配0個或1個由前面的正則表達式定義的片段，非貪婪方式,利用反斜槓表示?本身
            allow='http://www.meinv.hk/\?p=\d+'),  # r 有啥用,?使用正斜槓/無法爬取
            callback='parse_item',
            ),  # 當加入follow=True 後爬蟲仍然會爬取解析到的網頁中符合規則的url
    )

    def parse_item(self, response):
        item = {}
        global count
        item['star_name'] = response.xpath('//h1[@class="title"]/text()').get()
        print(item['star_name'])
        item['image_urls'] = response.xpath('//div[@class="post-content"]//img/@src').extract()
        print(item['image_urls'])
        count += 1
        print(count)
        return item  # return 和yield 有啥區別,返回回來的圖片張數一樣,27


"""
提取首頁中的美麗圖片規則
"""
# item['star_name'] = response.xpath('//div[@class="posts-default-title"]/h2/a/text()').get()
# print('--------')
# print(item['star_name'])
# item['image_urls'] = response.xpath('//div[@class="posts-default-img"]//img/@src').extract()
# print(item['image_urls'])
# print('---------------')
# # item['description'] = response.xpath('//div[@id="description"]').get()
# return item

# 規則來源
"""
2020-04-27 17:18:04 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.meinv.hk/?p=2701>
{'star_name': '噓', 'image_urls': ['http://www.meinv.hk/wp-content/uploads/2018/02/2018020721314999.jpeg', 'http
://www.meinv.hk/wp-content/uploads/2018/02/2018020721314456.jpg']}

"""

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python項目_Scrapy_爬取圖片???

Main Codes

釘釘打卡速度慢

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Python 潮流週刊#51：用 Python 繪製美觀的圖表

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

PV操作系列--讀者寫者問題

leetcode_739每日溫度

微信讀書薅羊毛大法

PV操作系列_生產者_消費者

leetcode_組合總和2_2種解法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結