python

python

原創

2019-02-22 23:07

python寫的基礎下載程序可以下載多頁

#conding:utf-8
import urllib2
import time
page=1        #初始化下載頁面爲第一頁
url = ['']*350   設置url的存儲
while page<8:   
        buf = urllib2.urlopen('http://blog.sina.com.cn/s/articlelist_1191258123_0_'+str(page)+'.html').read()    #打開url並讀取內容
        i = 0           
        title = buf.find(r'<a title=')   #從title開始查找
        href = buf.find(r'href=',title)
        html = buf.find(r'.html',href)
        while title !=-1 and href !=-1 and html !=-1 and i<50: 設置一頁面的url數和判斷title href都存在
                url[i] = buf[href+6:html+5] 使url正常
                print url[i]
                title = buf.find(r'<a title=',html)
                href = buf.find(r'href=',title)
                html = buf.find(r'.html',href)
                i = i+1
        else:
                print page,"find end "
        page = page+1
else:
        print 'all down '
j = 0
while j<350:          下載url
        biaoti = ['']*350
        content = urllib2.urlopen(url[j]).read()
        titname = content.find(r'SG_txta') 讀取標題
        end = content.find(r'</h',titname)
        biaoti[j] = content[titname+9:end]
        print biaoti[j]
             
        open(r'hanhan/'+url[j][-26:],'w+').write(content) 保存內容以url的最後26位爲名稱和後綴
        print 'downing ',url[j]
        j=j+1
        time.sleep(4)
else:
        print 'down fished'

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

釘釘打卡速度慢

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Python 潮流週刊#51：用 Python 繪製美觀的圖表

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

DomXss檢測模塊整合開源掃描器框架golismero

xsst.sinaapp.com的xss測試

python調試簡介

瀏覽器中輸入@字符相關

iptables裏的四表五鏈

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結