爬蟲學習（1）

原創

2020-03-09 13:29

小白整理大一期間學習的爬蟲知識

在學習爬蟲前，我是學習了基礎的python語法
對學過任何一門編程語言的人來說，還是比較容易上手的
並且需要有http協議等基礎的知識

python3提供了原生的模塊：urlib.request：

urlopen:返回response對象，response.read()，bytes.decode("utf-8)
get:傳參(漢字報錯：解釋器ascii沒有漢字，url漢字轉碼)
post
handle處理器的自定義
urlError
request(第三方)
數據解析：xpath bs4
數據存儲

提供兩個簡單例子，跟一個老師學習的，註釋都很詳細

import urllib.request

def load_data():
    url = "http://www.baidu.com/"
    #get的請求
    #http請求
    #response:http相應的對象
    response = urllib.request.urlopen(url)
    print(response)
    #讀取內容 bytes類型
    data = response.read()
    print(data)
    #將文件獲取的內容轉換成字符串
    str_data = data.decode("utf-8")
    print(str_data)
    #將數據寫入文件
    with open("baidu.html","wb+")as f:
        f.write(data)
    #將字符串類型轉換成bytes
    str_name = "baidu"
    bytes_name =str_name.encode("utf-8")
    print(bytes_name)

    #python爬取的類型:str bytes
    #如果爬取回來的是bytes類型:但是你寫入的時候需要字符串 decode("utf-8")
    #如果爬取過來的是str類型:但你要寫入的是bytes類型 encode(""utf-8")
load_data()

import urllib.request
import urllib.parse
import string

def get_method_params():

    url = "http://www.baidu.com/baidu?tn=monline_3_dg&ie=utf-8&wd="
    #拼接字符串(漢字)
    #python可以接受的數據
    #https://www.baidu.com/s?wd=%E7%BE%8E%E5%A5%B3

    name = "哈哈"
    final_url = url+name
    print(final_url)
    #代碼發送了請求
    #網址裏面包含了漢字;ascii是沒有漢字的;url轉譯
    #將包含漢字的網址進行轉譯
    encode_new_url = urllib.parse.quote(final_url,safe=string.printable)
    print(encode_new_url)
    # 使用代碼發送網絡請求
    response = urllib.request.urlopen(encode_new_url)
    print(response)
    #讀取內容
    data = response.read().decode()
    print(data)
    #保存到本地
    with open("02-encode.html","w",encoding="utf-8")as f:
        f.write(data)
    #UnicodeEncodeError: 'ascii' codec can't encode
    # characters in position 10-11: ordinal not in range(128)
    #python:是解釋性語言;解析器只支持 ascii 0 - 127
    #不支持中文

get_method_params()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬蟲學習（1）

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

TYUT紅帽初賽題庫題解（2）

TYUT紅帽初賽題庫題解（4）

軟件工程-軟件的本質特徵

TYUT紅帽初賽題庫題解（1）

TYUT紅帽初賽題庫題解（5）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結