台部落Air

之前用正則寫過爬取豆瓣，這裏就直接粘源碼了源碼： #-*- coding:utf-8 -*- # author:Air # software: PyCharm #學習交流qq羣：916696436 import requests f

2020-06-27 12:38:52

爬取高考志願填報系統（https://gkcx.eol.cn/）的所有學校 (一)、第一種方法 1.分析請求 2.構造url base_url='https://gkcx.eol.cn/gkcx/api?' data={

2020-06-27 12:38:52

#-*- coding:utf-8 -*- # author:Air # software: PyCharm #學習交流qq羣：916696436 from fake_useragent import UserAgent #生成對象

2020-06-27 12:38:52

導入庫 import pymysql （一）增 def insert(value): # 打開數據庫連接用戶名密碼數據庫名 db = pymysql.connect("localhost", "us

2020-06-27 12:38:52

導入相關庫 """ -*- coding:utf-8 -*- author:Air datetime:2019/7/26 22:26 software: PyCharm 學習交流qq羣：916696436 """ import

2019-07-31 03:12:44

分析網頁：分別爲一頁和二頁的數據導入相關庫 import requests from parsel import Selector from multiprocessing import Pool from fake_usera

2019-07-31 03:12:44

導入相關庫 """ -*- coding:utf-8 -*- author:Air datetime:2019/7/25 17:40 software: PyCharm 學習交流qq羣：916696436 """ import

2019-07-31 03:12:44

1.網頁抓取電影名字、導演、時間、評分、評價人數、評論 2.items class DoubanItem(scrapy.Item): # define the fields for your item here like:

2019-06-11 07:39:06

導入相關庫 #-*- coding:utf-8 -*- # author:Air # datetime:2019/5/16 20:32 # software: PyCharm #學習交流qq羣：916696436 import req

2019-06-11 07:39:06

import requests from bs4 import BeautifulSoup from lxml import etree import re import json from fake_useragent import

2019-05-12 11:32:54

import json #1.字符串和dict list轉換 #字符串（json）-----dict list data='[{"name":"張三","age":"20"},{"name":"李四","age":"18"}]' li

2019-05-11 10:13:04

import requests from bs4 import BeautifulSoup from fake_useragent import UserAgent useragent=UserAgent() headers={

2019-04-29 13:43:35

DataFrame.drop_duplicates（subset = None，keep ='first'，inplace = False ） subset ：指定列，默認情況下使用所有列 keep ： {'first'，'last'

2019-04-21 12:24:48

目錄一、爬取數據二、數據清洗一、爬取數據 1.請求頁面 (1)導入包 import requests from bs4 import BeautifulSoup import re import pymysql （2）添加請求頭

2019-04-19 02:06:49

爬取豆瓣電影Top 250（圖片、排名、名字、作者、評語）（1）導入包 import requests from bs4 import BeautifulSoup import re （2）發送請求 headers={

2019-04-17 02:29:22