17 - 05 - 26 Python contents / children / descendants 區別

原創

Sodaoo

2018-09-03 00:31

先說說導航樹：

# Navigating Trees 導航樹：

The findAll function is responsible for finding tags based on their name and attribute.

（依靠標籤的名稱和屬性來查找標籤）

但是如果你需要通過標籤在文檔中的位置來查找標籤，該怎麼辦？

某HTML文件就可以映射成爲這樣一棵具有明確親子關係的樹：

html

— body

— div.wrapper

— h1

— div.content

— table#giftList

— tr

— th

— tr.gift#gift1

— td

.......

一般BeautifulSoup函數總是處理當前標籤的後代標籤，例如：bs0bj.body.h1，

類似的，bs0bj.div.findAll("img")會找出文檔中第一個div標籤，然後獲取這個div後代裏的所有img標籤列表。

可是如果你只是想找出子標籤：可以用 .children ：

>> from urllib.request import urlopen

>> from bs4 import BeautifulSoup

>> html = urlopen("www.pyth..ng.com/pages/page3.html")

>> bsObj = BeautifulSoup(html)

>> for child in bsObj.find("table",{"id":"giftList"}).children:

>> print(child)

This code prints out all of the list of product rows in the giftList table

(table giftlist下所有的直接子標籤的內容包括標籤/屬性/文字/)

# 注意 .contents / .children / .descendants(後代) 的區別：

tag的 .contents 屬性可以將tag的子節點以列表的方式輸出:

>>>head_tag

<head><title>The Dormouse's story</title></head>

>>>head_tag.contents

<title>The Dormouse's story</title>

>>>title_tag = head_tag.contents[0]

>>>title_tag.contents

The Dormouse's story

BeautifulSoup 對象本身一定會包含子節點,也就是說<html>標籤也是 BeautifulSoup 對象的子節點:

soup.contents[0].name

# u'html'

字符串沒有 .contents 屬性,因爲字符串沒有子節點:

通過tag的 .children 生成器,可以對tag的子節點進行循環:

>>>for child in title_tag.children:

>>> print(child)

The Dormouse's story

綜上 .contents 和 .children 屬性僅包含tag的直接子節點 .

例如,<head>標籤只有一個直接子節點(兒子)：<title>

>>>head_tag.contents

<title>The Dormouse's story</title>

但是<title>標籤自身也包含一個子節點 : 字符串："The Dormouse’s story",

這種情況下字符串"The Dormouse’s story"屬於<head>標籤的子孫節點 .

.contents 和 .children並不能輸出這個"孫節點" ,

而： .descendants 屬性可以對所有tag的子孫節點進行遞歸循環 :

>>>for child in head_tag.descendants:

>>> print(child)

<title>The Dormouse's story</title>

The Dormouse's story

--------------------取材《Web scraping...》 / BeautifulSoup 官方文檔。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

17 - 05 - 26 Python contents / children / descendants 區別

17 - 03 - 30 圖解HTTP（34）

17 - 03 - 21 圖解HTTP（25）

17 - 03 - 20 圖解HTTP（24）

17 - 04 - 02 圖解HTTP（37）

17 - 03 - 31 圖解HTTP（35）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結