前言:本次項目需要對xml進行信息讀取以便可視化數據分佈,這裏簡單介紹一下python中xml.ttree.ElementTree包
<annotation>
<folder>data</folder>
<filename>000001.jpg</filename>
<path>D:\Program Files\Profiles\labelimg\data\000001.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1500</width>
<height>1000</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>e</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>727</xmin>
<ymin>377</ymin>
<xmax>791</xmax>
<ymax>441</ymax>
</bndbox>
</object>
<object>
<name>o</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>710</xmin>
<ymin>209</ymin>
<xmax>763</xmax>
<ymax>245</ymax>
</bndbox>
</object>
<object>
<name>l</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>917</xmin>
<ymin>319</ymin>
<xmax>956</xmax>
<ymax>357</ymax>
</bndbox>
</object>
<object>
<name>r</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>1461</xmin>
<ymin>217</ymin>
<xmax>1495</xmax>
<ymax>253</ymax>
</bndbox>
</object>
</annotation>
1.基本操作
---xml解析:
tree = ET.parse("你的xml路徑") #得到ET對象
root = tree.getroot() #得到根節點
ET.dump(root) #顯示整個xml
---對於每一個element對象都有一下屬性:
tag: string對象, 表示數據代表的種類
attrib:dictionary對象, 表示附有的屬性
text: string對象, 表示element的內容
tail:string對象,表示element閉合之後的尾跡
若干子元素
import os
import xml.etree.cElementTree as ET
import shutil
import sys
tree = ET.parse("000001.xml")
root = tree.getroot()
print("root:", root)
print("tag:",root.tag)
print("attrib",root.attrib)
print("text", root.text)
print("tail:",root.tail)
'''
root: <Element 'annotation' at 0x7f72dfc8ce08>
tag: annotation
attrib {}
text '\n'
tail: None'''
---簡單遍歷:
#直接全遍歷
for child in root:
print("tag:", child.tag, "attrib:", child.attrib, "text:", child.text)
'''
tag: folder attrib: {} text: data
tag: filename attrib: {} text: 000001.jpg
tag: path attrib: {} text: D:\Program Files\Profiles\labelimg\data\000001.jpg
tag: source attrib: {} text:
tag: size attrib: {} text:
tag: segmented attrib: {} text: 0
tag: object attrib: {} text:
tag: object attrib: {} text:
tag: object attrib: {} text:
tag: object attrib: {} text:
'''
#數組的形式訪問
print(root[4][1].tag)#height
---一些方便的查找函數:
1.find(match) #查找第一個匹配的子元素,match可以是tag或是xpath路徑
2.findall(match) #返回所有匹配的子元素列表
3.findtext(match , default=None)
4.iter(tag=None) #以當前元素爲根節點,創建樹迭代器,如果tag不是none,則以tag過濾
5.iterfind(match) #
for child in root.iter("name"):#不只下一級
print(child.text)
for child in root.findall("object"):#只能找下一級
print(child.text)
----修改xml
1.屬性相關(是一個標籤裏的屬性級操作)
改好後:tree.write("你保存的xml路徑") #保存
- attrib 爲包含元素屬性的字典
keys() 返回元素屬性名稱列表
- items() 返回(name,value)列表
get
(key, default=None) 獲取屬性set
(key, value) # 跟新/添加 屬性- del xxx.attrib[key] # 刪除對應的屬性
2.節點相關
刪除節點:.remove(....)
添加子元素方法總結:
append
(subelement)extend
(subelements)insert
(index, element)