基於python(xpath)的-爬取51job網信息(跳過User-Agent)

原創

2020-02-20 16:39

# -*- coding:utf-8 -*-

import requests
from fake_useragent import UserAgent
from lxml import etree

agent = UserAgent()
url = "http://search.51job.com/list/010000%252C020000%252C030200%252C040000%252C080200,000000,0000,00,9,99,python,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare="
response = requests.get(
    url,
    headers={"User-Agent":agent.random},
)
response.encoding = response.apparent_encoding
root = etree.HTML(response.text)
div_list = root.xpath('//div[@class="dw_table"]/div[@class="el"]')

for div in div_list:
    name = div.xpath('p/span/a/text()')[0]
    name = name.strip()
    company = div.xpath('span[@class="t2"]/a/text()')[0]
    place = div.xpath('span[@class="t3"]/text()')[0]
    money = div.xpath('span[@class="t4"]/text()')
    time = div.xpath('span[@class="t5"]/text()')
    # if not money:
    #     money = "面議"
    # else:
    #     money = money[0]
    money = money[0] if money else "面議"
    time = time[0] if time else "沒有時間"
    print("職位名:%s" % name)
    print("公司名:%s" % company)
    print("工作地點:%s" % place)
    print("薪資:%s" % money)
    print("上傳時間:%s" % time)
    print("----------------------------")
    # with open('job.csv', 'a', encoding='gb18030') as f:
    #     f.write(name+','+company+','+place+','+money+','+time)
    #     f.write('\n')
    with open('51job.csv', 'a', encoding='gb18030') as f:
        im_list = [name,company,place,money,time,'\n']
        f.write(','.join(im_list))

hs947463167

發佈了87 篇原創文章 · 獲贊 14 · 訪問量 5萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

基於python(xpath)的-爬取51job網信息(跳過User-Agent)

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

sql server sp_executesql 中使用表變量進行查詢

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

基於python的-scrapy框架的基本用法

基於python的-內存管理

基於python的-正則中的函數

基於python的-Random_Agent

基於python的-get和post請求

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結