快速與簡單並非天生不可兼得, 相反, 簡單的東西應該是快速的
在使用 SAX 解析 XML 的過程中, 碰到了以下問題:
- SAX Handler 並沒有想象中快, 尤其是文件比較大的時候
- SAX Handler 編寫容易出錯, 因爲需要區別不同的元素, 需要很多判斷才能拿到自己想要的信息
- 沒有統一的方法獲取SAX Handler解析出來的信息
1, Stoppable
缺省情況下SAX Parser會解析整個文件, 即使你已經取得了足夠的你想要的信息, 但解析不會停止, 這就是感覺SAX Parser在解析大文件的時候不是很快的原因
只有異常才能阻止SAX Parser繼續解析, 所以解決方法很簡單:
public interface Stoppable {
boolean canStop();
}
public abstract class EnhancedHandler extends DefaultHandler implements Reportable {
private boolean canStop;
public boolean canStop() { return canStop; }
protected void stop() { canStop = true; } //call this method when subclass objects get enough information.
}
public class CompositeEnhancedHandler extends DefaultHandler {
private static final RuntimeException SHOULD_STOP_EXCEPTION = new ShouldStopParsingException();
private final EnhancedHandler[] handlers;
public CompositeEnhancedHandler(EnhancedHandler... handlers) {
this.handlers = handlers;
}
public void characters(char[] ch, int start, int length) throws SAXException {
for (EnhancedHandler handler : handlers) { handler.characters(ch, start, length); }
throwExceptionIfCanStop();
}
public void endElement(String uri, String localName, String qName) throws SAXException {
for (EnhancedHandler handler : handlers) { handler.endElement(uri, localName, qName); }
throwExceptionIfCanStop();
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
for (EnhancedHandler handler : handlers) { handler.startElement(uri, localName, qName, attributes); }
throwExceptionIfCanStop();
}
private void throwExceptionIfCanStop() {
for (EnhancedHandler handler : handlers) { if (!handler.canStop()) { return; } }
throw SHOULD_STOP_EXCEPTION;
}
}
CompositeEnhancedHandler handler = new CompositeEnhancedHandler(new Handler1(), new Handler2());
try {
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
saxParser.parse(new File("england.xml"), handler);
} catch (ShouldStopParsingException se) {
// All handlers got enough information, just stop parsing.
}
2. Subscribable
不能指定只處理特定元素的能力的缺乏, 使得SAX Handler難以編寫且易於出錯, 不得不判斷當前元素的名稱, 是否正在處理特定的元素等, 這使得每個Handler都在重複這些邏輯相似的代碼.
解決方法是提供一個額外的中間層, 詢問SAX Handler對哪個元素感興趣. 該中間層只會向每個SAX Handler發送它們感興趣的元素信息. (也可以採用每個SAX Handler向中間層註冊感興趣信息的方法, 但比較複雜, ESAX採用前者)
public interface Subscribable {
String subscribe();
}
b). 中間層 CompositeEnhancedHandler:
public class CompositeEnhancedHandler extends DefaultHandler {
private final AddableMap mapping = new AddableMap();
private List<EnhancedHandler> currentHandlers;
public CompositeEnhancedHandler(EnhancedHandler... handlers) {
... ...
for (EnhancedHandler handler : handlers) { mapping.get(handler.subscribe()).add(handler); }
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currentHandlers = mapping.get(qName);
for (EnhancedHandler handler : currentHandlers) { handler.startElement(uri, localName, qName, attributes); }
... ...
}
for (EnhancedHandler handler : currentHandlers) { handler.characters(ch, start, length); }
... ...
}
public void endElement(String uri, String localName, String qName) throws SAXException {
for (EnhancedHandler handler : currentHandlers) { handler.endElement(uri, localName, qName); }
... ...
}
private static class AddableMap {
private Map<String, List<EnhancedHandler>> container = new HashMap<String, List<EnhancedHandler>>();
public List<EnhancedHandler> get(String qname) {
if (!container.containsKey(qname)) { container.put(qname, new ArrayList<EnhancedHandler>()); }
return container.get(qname);
}
}
}
3. Reportable
DOM提供了很方便的方法供提取特定信息, 但SAX Handler缺失了這項能力, 感興趣的信息被藏在每個Handler內部
ESAX提供的解決方法是"收集參數模式"
public interface Reportable {
void report(Map resultSet);
}
b). 缺省支持:
public abstract class EnhancedHandler extends DefaultHandler implements Reportable, Stoppable, Subscribable {
... ...
}
public class CompositeEnhancedHandler extends DefaultHandler implements Reportable {
public void report(Map resultSet) {
for (EnhancedHandler handler : handlers) { handler.report(resultSet); }
}
}
最終, ESAX 爲 原始的 SAX Handler 補足了 可中止的能力, 可訂閱的能力, 可彙報的能力, 使得比原始的SAX Handler更快, 比DOM接口更簡單, 更易於編程
一個簡單的例子可參見:
http://jade-stone-suite.googlecode.com/svn/trunk/JS.ESax/test/jade/stone/esax/sample/FACupHandler.java
測試用例參見:
http://jade-stone-suite.googlecode.com/svn/trunk/JS.ESax/test/jade/stone/esax/test/CompositeEnhancedHandlerTest.java
最終的缺省實現可參見:
http://jade-stone-suite.googlecode.com/svn/trunk/JS.ESax/src/jade/stone/esax/support/EnhancedHandler.java
http://jade-stone-suite.googlecode.com/svn/trunk/JS.ESax/src/jade/stone/esax/support/CompositeEnhancedHandler.java
項目主頁:
http://jade-stone-suite.googlecode.com/svn/trunk/JS.ESax/