DOM4J 使用簡介

Dom4j 使用簡介

作者：冰雲<spanlang=en-us style="font-family:"Courier New";color:navy"> icecloud(AT)sina.com

時間：<spanlang=en-us style="font-family:"Courier New";color:navy">2003.12.15

版權聲明：

本文由冰雲完成，首發於CSDN，未經許可，不得使用於任何商業用途。

文中代碼部分引用自DOM4J文檔。

歡迎轉載，但請保持文章及版權聲明完整。

如需聯絡請發郵件：icecloud(AT)sina.com

DOM4J<spanstyle='font-family:宋體'>是dom4j.org<spanstyle='font-family:宋體'>出品的一個開源XML<spanstyle='font-family:宋體'>解析包，它的網站中這樣定義：

Dom4j<spanlang=en-us style="font-family:"Courier New";color:navy"> is an easy to use, open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework and with full support for DOM, SAX and JAXP.

Dom4j<spanstyle='font-family:宋體;color:navy'>是一個易用的、開源的庫，用於XML<spanstyle='font-family:宋體;color:navy'>，XPath<spanstyle='font-family:宋體;color:navy'>和XSLT<spanstyle='font-family:宋體;color:navy'>。它應用於Java<spanstyle='font-family:宋體;color:navy'>平臺，採用了Java<spanstyle='font-family:宋體;color:navy'>集合框架並完全支持DOM<spanstyle='font-family:宋體;color:navy'>，SAX<spanstyle='font-family:宋體;color:navy'>和JAXP<spanstyle='font-family:宋體;color:navy'>。

DOM4J使用起來非常簡單。只要你瞭解基本的<spanlang=en-us style="font-family:"Courier New"">XML-DOM<spanstyle='font-family:宋體'>模型，就能使用。然而他自己帶的指南只有短短一頁（html），不過說的到挺全。國內的中文資料很少。因而俺寫這個短小的教程方便大家使用，這篇文章僅談及基本的用法，如需深入的使用，請<spanlang=en-us style="font-family:"Courier New"">……自己摸索或查找別的資料。

之前看過<spanlang=en-us style="font-family:"Courier New"">IBM developer<spanstyle='font-family:宋體'>社區的文章（參見附錄），提到一些XML解析包的性能比較，其中<spanlang=en-us style="font-family:"Courier New"">DOM4J<spanstyle='font-family:宋體'>的性能非常出色，在多項測試中名列前茅。（事實上DOM4J的官方文檔中也引用了這個比較）所以這次的項目中我採用了<spanlang=en-us style="font-family:"Courier New"">DOM4J<spanstyle='font-family:宋體'>作爲XML<spanstyle='font-family:宋體'>解析工具。

在國內比較流行的是使用<spanlang=en-us style="font-family:"Courier New"">JDOM作爲解析器，兩者各擅其長，但DOM4J<spanstyle='font-family:宋體'>最大的特色是使用大量的接口，這也是它被認爲比JDOM靈活的主要原因。大師不是說過麼，<spanlang=en-us style="font-family:"Courier New"">“面向接口編程”<spanstyle='font-family:宋體'>。目前使用DOM4J<spanstyle='font-family:宋體'>的已經越來越多。如果你善於使用JDOM，不妨繼續用下去，只看看本篇文章作爲了解與比較，如果你正要採用一種解析器，不如就用<spanlang=en-us style="font-family:"Courier New"">DOM4J<spanstyle='font-family:宋體'>吧。

它的主要接口都在<spanlang=en-us style="font-family:"Courier New"">org.dom4j<spanstyle='font-family:宋體'>這個包裏定義：

*Attribute*	`Attribute定義了XML的屬性`
*Branch*	`Branch爲能夠包含子節點的節點如XML元素(Element)和文檔(Docuemnts)定義了一個公共的行爲，`
*CDATA*	`CDATA` 定義了XML CDATA 區域
*CharacterData*	`CharacterData是一個標識藉口，標識基於字符的節點。如CDATA，Comment, Text`.
*Comment*	`Comment` 定義了XML註釋的行爲
*Document*	`定義了`XML文檔
*DocumentType*	`DocumentType` 定義XML DOCTYPE聲明
*Element*	`Element`定義XML 元素
*ElementHandler*	`ElementHandler`定義了 `Element` 對象的處理器
*ElementPath*	被 `ElementHandler` 使用，用於取得當前正在處理的路徑層次信息
*Entity*	`Entity`定義 XML entity
*Node*	`Node爲所有的dom4j中XML節點`定義了多態行爲
*NodeFilter*	`NodeFilter` 定義了在dom4j節點中產生的一個濾鏡或謂詞的行爲（predicate）
*ProcessingInstruction*	`ProcessingInstruction` 定義 XML 處理指令.
*Text*	`Text` 定義XML 文本節點.
*Visitor*	`Visitor` 用於實現`Visitor`模式.
*XPath*	`XPath` 在分析一個字符串後會提供一個XPath 表達式

看名字大致就知道它們的涵義如何了。

要想弄懂這套接口，關鍵的是要明白接口的繼承關係：

interface java.lang.Cloneable
- interface org.dom4j.Node

- - interface org.dom4j.Attribute
  - interface org.dom4j.Branch

- - - interface org.dom4j.Document
    - interface org.dom4j.Element
  - interface org.dom4j.CharacterData
    - interface org.dom4j.CDATA
    - interface org.dom4j.Comment
    - interface org.dom4j.Text
  - interface org.dom4j.DocumentType
  - interface org.dom4j.Entity
  - interface org.dom4j.ProcessingInstruction

一目瞭然，很多事情都清楚了。大部分都是由<spanlang=en-us style="font-family:"Courier New"">Node繼承來的。知道這些關係，將來寫程序就不會出現ClassCastException<spanstyle='font-family:宋體'>了。

下面給出一些例子（部分摘自<spanlang=en-us style="font-family:"Courier New"">DOM4J<spanstyle='font-family:宋體'>自帶的文檔），簡單說一下如何使用。

<spanlang=en-us style="font-family:"Courier New"">１．讀取並解析XML文檔：

讀寫<spanlang=en-us style="font-family:"Courier New"">XML文檔主要依賴於org.dom4j.io<spanstyle='font-family:宋體'>包，其中提供<spanlang=en-us style="font-family:"Courier New";color:black"><ahref=".. local%20settings="" temporary%20internet%20files="" myweb="" myclasses="" dom4j-1.4="" doc="" apidocs="" org="" dom4j="" io="" domreader.html"=""><spanstyle='color:black;text-decoration:none'>DOMReader<spanstyle='font-family:宋體;color:black'>和SAXReader兩類不同方式，而調用方式是一樣的。這就是依靠接口的好處。

// 從文件讀取XML，輸入文件名，返回XML文檔

public Document read(String fileName) throws MalformedURLException, DocumentException {

SAXReader reader = new SAXReader();

Document document = reader.read(new File(fileName));

return document;

}

其中，<spanlang=en-us style="font-family:"Courier New"">reader<spanstyle='font-family:宋體'>的read<spanstyle='font-family:宋體'>方法是重載的，可以從InputStream, File, Url等多種不同的源來讀取。得到的<spanlang=en-us style="font-family:"Courier New"">Document<spanstyle='font-family:宋體'>對象就帶表了整個XML<spanstyle='font-family:宋體'>。

根據本人自己的經驗，讀取的字符編碼是按照<spanlang=en-us style="font-family:"Courier New"">XML文件頭定義的編碼來轉換。如果遇到亂碼問題，注意要把各處的編碼名稱保持一致即可。

<spanlang=en-us style="font-family:"Courier New"">２．取得Root節點

讀取後的第二步，就是得到<spanlang=en-us style="font-family:"Courier New"">Root節點。熟悉XML<spanstyle='font-family:宋體'>的人都知道，一切XML<spanstyle='font-family:宋體'>分析都是從Root<spanstyle='font-family:宋體'>元素開始的。

　 public Element getRootElement(Document doc){

return doc.getRootElement();

}

<spanlang=en-us style="font-family:"Courier New"">３．遍歷XML樹

DOM4J提供至少<spanlang=en-us style="font-family:"Courier New"">3種遍歷節點的方法：

1) 枚舉<spanlang=en-us style="font-family:"Courier New"">(Iterator)

// 枚舉所有子節點

for ( Iterator i = root.elementIterator(); i.hasNext(); ) {

Element element = (Element) i.next();

// do something

}

// 枚舉名稱爲foo的節點

for ( Iterator i = root.elementIterator(foo); i.hasNext();) {

Element foo = (Element) i.next();

// do something

}

// 枚舉屬性

for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {

Attribute attribute = (Attribute) i.next();

// do something

2)遞歸<spanlang=en-us style="font-family:"Courier New"">

遞歸也可以採用<spanlang=en-us style="font-family:"Courier New"">Iterator<spanstyle='font-family:宋體'>作爲枚舉手段，但文檔中提供了另外的做法

public void treeWalk() {

treeWalk(getRootElement());

}

public void treeWalk(Element element) {

for (int i = 0, size = element.nodeCount(); i < size; i++) {

Node node = element.node(i);

if (node instanceof Element) {

treeWalk((Element) node);

} else { // do something....

}

3) Visitor模式

最令人興奮的是<spanlang=en-us style="font-family:"Courier New"">DOM4J<spanstyle='font-family:宋體'>對Visitor<spanstyle='font-family:宋體'>的支持，這樣可以大大縮減代碼量，並且清楚易懂。瞭解設計模式的人都知道，<spanlang=en-us style="font-family:"Courier New"">Visitor<spanstyle='font-family:宋體'>是GOF<spanstyle='font-family:宋體'>設計模式之一。其主要原理就是兩種類互相保有對方的引用，並且一種作爲<spanlang=en-us style="font-family:"Courier New"">Visitor<spanstyle='font-family:宋體'>去訪問許多Visitable<spanstyle='font-family:宋體'>。我們來看DOM4J<spanstyle='font-family:宋體'>中的Visitor<spanstyle='font-family:宋體'>模式(<spanstyle='font-family:宋體'>快速文檔中沒有提供)

只需要自定一個類實現<spanlang=en-us style="font-family:"Courier New"">Visitor<spanstyle='font-family:宋體'>接口即可。

　 public class MyVisitor extends VisitorSupport {

public void visit(Element element){

System.out.println(element.getName());

}

public void visit(Attribute attr){

System.out.println(attr.getName());

}

        調用：  root.accept(new MyVisitor())

Visitor<spanstyle='font-family:宋體'>接口提供多種Visit()<spanstyle='font-family:宋體'>的重載，根據XML<spanstyle='font-family:宋體'>不同的對象，將採用不同的方式來訪問。上面是給出的Element和<spanlang=en-us style="font-family:"Courier New"">Attribute<spanstyle='font-family:宋體'>的簡單實現，一般比較常用的就是這兩個。VisitorSupport是DOM4J<spanstyle='font-family:宋體'>提供的默認適配器，Visitor接口的<spanlang=en-us style="font-family:"Courier New"">Default Adapter<spanstyle='font-family:宋體'>模式，這個模式給出了各種visit(*)的空實現，以便簡化代碼。

<spanstyle='font-family:宋體'>注意，這個Visitor<spanstyle='font-family:宋體'>是自動遍歷所有子節點的。如果是root.accept(MyVisitor)，將遍歷子節點。我第一次用的時候，認爲是需要自己遍歷，便在遞歸中調用<spanlang=en-us style="font-family:"Courier New"">Visitor<spanstyle='font-family:宋體'>，結果可想而知。

4. XPath<spanstyle='font-family:宋體'>支持

DOM4J<spanstyle='font-family:宋體'>對XPath<spanstyle='font-family:宋體'>有良好的支持，如訪問一個節點，可直接用XPath選擇。

public void bar(Document document) {

List list = document.selectNodes( //foo/bar );

Node node = document.selectSingleNode(//foo/bar/author);

String name = node.valueOf( @name );

}

<spanlang=en-us style="font-size:10.0pt;font-family:"Courier New";background:white"> <spanstyle='font-size:10.0pt;font-family:宋體;background:white'>例如，如果你想查找<spanlang=en-us style="font-size:10.0pt;font-family:"Courier New";background:white">XHTML<spanstyle='font-size:10.0pt;font-family:宋體;background:white'>文檔中所有的超鏈接，下面的代碼可以實現：<spanstyle='font-size:10.0pt;font-family:"courier new"'="">

public void findLinks(Document document) throws DocumentException {

List list = document.selectNodes( //a/@href );

for (Iterator iter = list.iterator(); iter.hasNext(); ) {

Attribute attribute = (Attribute) iter.next();

String url = attribute.getValue();

}

<spanlang=en-us style="font-size:10.0pt;font-family:"Courier New"">5. <spanstyle='font-size:10.0pt;font-family:宋體'>字符串與XML<spanstyle='font-size:10.0pt;font-family:宋體'>的轉換

<spanstyle='font-size:10.0pt;font-family:宋體'>有時候經常要用到字符串轉換爲XML<spanstyle='font-size:10.0pt;font-family:宋體'>或反之，

    // XML轉字符串

　 Document document = ...;

String text = document.asXML();

// 字符串轉XML

String text = <person> <name>James</name> </person>;

Document document = DocumentHelper.parseText(text);

<spanlang=en-us style="font-size:10.0pt;font-family:"Courier New"">6 <spanstyle='font-size:10.0pt;font-family:宋體'>用XSLT<spanstyle='font-size:10.0pt;font-family:宋體'>轉換XML

DOM4J 使用簡介

Struts原理與應用（三）

泛型(Generics Types)學習筆記

Struts快速入門（三）

pureftpd安裝配置簡明說明

DOM4J 使用簡介

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結