爬蟲爬取csdn某一分類下的所有文章標題

原創

2018-09-03 19:35

python+selenium+redis.
整個程序是通過selenium不斷點擊“刷新“獲得新的題目然後存到redis中。
沒有selenium可以通過改變ip和瀏覽器取得同樣的效果。

# -*- coding: utf-8 -*-
"""
Spyder Editor
This is a temporary script file.
author： ParkJiYeon
update:18.05.25
"""

from selenium import webdriver
import time
from lxml import etree
from redis import StrictRedis
def GetTitle(url):
    browser = webdriver.Chrome()
    browser.get(url)
    while True:
        for i in range(2,15):
            new_url = "//*[@id='feedlist_id']/li["+str(i)+"]/div/div[1]/h2/a"
            lis = browser.find_element_by_xpath(new_url)
            redis.rpush('text',lis.text)
        button = browser.find_element_by_xpath("//*[@id='nav']/div/div/ul/li[6]/a")
        button.click()
        time.sleep(5)
    browser.close()
def ShowTitle():
    len =redis.llen('text')
    for i in range(1,len):
        print(redis.lindex('text',i).decode("utf-8"))
if __name__=="__main__":
    url="https://blog.csdn.net/nav/cloud"
    redis = StrictRedis()
    GetTitle(url)
    ShowTitle()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬蟲爬取csdn某一分類下的所有文章標題

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

Codeforces Round #426 (Div. 2) C:The Meaningless Game（思維）

Codeforces Round #426 (Div. 2)(A+B)

第八屆福建省大學生程序設計競賽-重現賽(A+D+K+L)

個人近況

FZU Problem 2280 Magic(Hash）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結