Selenium爬蟲遇到 數據是以 JSON 字符串的形式包裹在 Script 標籤中,
假設Script標籤下代碼如下:
<script id="DATA_INFO" type="application/json" >
{
"user": {
"isLogin": true,
"userInfo": {
"id": 123456,
"nickname": "LiMing",
"intro": "人生苦短,我用python"
}
}
}
</script>
此時drive.find_elements_by_xpath('//*[@id="DATA_INFO"]
只能定位到元素,但是無法通過.text
方法,獲取Script標籤下的json數據
from bs4 import BeautifulSoup as bs
import json as js
#selenium獲取當前頁面源碼
html = drive.page_source
#BeautifulSoup轉換頁面源碼
bs=BeautifulSoup(html,'lxml')
#獲取Script標籤下的完整json數據,並通過json加載成字典格式
js_test=js.loads(bs.find("script",{"id":"DATA_INFO"}).get_text())
#獲取Script標籤下的nickname 值
js_test001=js.loads(bs.find("script",{"id":"DATA_INFO"}).get_text()).get("user").get("userInfo").get("nickname")