常用的使用xpath的場景:
1.python lxml模塊
from lxml import etree
content = '''
<html>
<head>
<title>test</title>
</head>
<body>
<div>xpath提取信息</div>
</body>
</html>
'''
html = etree.HTML(content)
text = html.xpath('//title/text()') # test
2.scrapy框架response對象的xpath方法
def parse(self, response):
response.xpath('//title/text()').extract_first()
xpath常用定位:
1.定位某節點之後兄弟節點:
//div[@class='name']/following-sibling::div
2.定位某節點之前兄弟節點:
//div[@class='name']/preceding-sibling::div
3.定位包含指定值的節點
//div[contains(text(), 'value')]
4.定位指定text值的節點
//div[text()='value']
5.定位以指定值開始的節點
//div[starts-with(text(), '容')]
6.提取某節點和子節點的文本
string(//div[@class='price'])
7.xpath位運算
//dt[contains(text(), '容') and contains(text(), '積') and contains(text(), '率')]
8.定位父節點
//div[@name='name']/..
9.提取節點某屬性
//title/@href