百度翻譯爬蟲僅供學習

百度翻譯爬蟲要滿足以下幾點:

1 登錄自己的百度賬號,需要cookie 信息

2 使用chrome手機瀏覽模式

 

3 熟悉post請求
 
4 這個是練手的爬蟲訓練,後續後補上面向對象的代碼。
# 第一版本的百度翻譯爬蟲,非面向對象
import requests
import execjs  # 執行js代碼
import json
import sys

# headers  全部使用網頁中的cookie
headers = {
    "accept": "*/*",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "zh-CN,zh;q=0.9",
    "content-length": "122",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "cookie": "填入自己的cookie信息",  # 可以先將cookie信息複製sublime 避免 pycharm 出現亂序
    "origin": "https://fanyi.baidu.com",
    "referer": "https://fanyi.baidu.com/",
    "user-agent": "Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Mobile Safari/537.36",
    "x-requested-with": "XMLHttpRequest"
}
# query = sys.argv[1]
query = "good morning"
# 'baidufanyi_sign.js' 在下面
with open('baidufanyi_sign.js', 'r', encoding='utf-8') as f:  # 運行百度翻譯的js代碼,生成 sign 值
    ctx = execjs.compile(f.read())
    sign = ctx.call('e', query)
# print(sign)
p = sign
# print(p)
post_data = {   # post 請求 攜帶的數據
    "query": query,
    "from": "en",
    "to": "zh",
    "token": "a5f643c1c49ffcfd86b9638e358a3376",
    "sign": p,
    "simple_means_flag": "3",
}
post_url = "https://fanyi.baidu.com/v2transapi"  # 百度翻譯的接口
r = requests.post(post_url, data=post_data, headers=headers)
print(r.content)
print(type(r.content.decode()))  # str
dict_ret = json.loads(r.content.decode(), strict=False)  # json.loads  把json 數據格式轉換成 python 字典形式 dict
print(type(dict_ret))            # dict
ret = dict_ret["trans_result"]['data'][0]['dst']
print(ret)

這是百度翻譯的sign 生成js。

 

function n(r, o) {
    for (var t = 0; t < o.length - 2; t += 3) {
        var a = o.charAt(t + 2);
        a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
            a = "+" === o.charAt(t + 1) ? r >>> a : r << a,
            r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
    }
    return r
}

var i = "320305.131321201"
function e(r) {
        var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
        if (null === o) {
            var t = r.length;
            t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
        } else {
            for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
                "" !== e[C] && f.push.apply(f, a(e[C].split(""))),
                C !== h - 1 && f.push(o[C]);
            var g = f.length;
            g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
        }
        var u = void 0
            , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
        u = null !== i ? i : (i = window[l] || "") || "";
        for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
            var A = r.charCodeAt(v);
            128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
                S[c++] = A >> 18 | 240,
                S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
                S[c++] = A >> 6 & 63 | 128),
                S[c++] = 63 & A | 128)
        }
        for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
            p += S[b],
                p = n(p, F);
        return p = n(p, D),
            p ^= s,
        0 > p && (p = (2147483647 & p) + 2147483648),
            p %= 1e6,
        p.toString() + "." + (p ^ m)
    }

聲明:代碼僅用爲技術研究使用,禁止作爲商業使用哦。

感謝百度翻譯,感謝

https://blog.csdn.net/qq_38534107/article/details/90440403,js代碼的分析可以在此文查看。

 

 

https://blog.csdn.net/qq_38534107/article/details/90440403

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章