這個問題不太好描述,因此還原一下場景:
看下面的郵件,這封郵件是對之前一封郵件的回覆,因此在內容上就把之前郵件的內容也附加上了,那如果想只取本次郵件內容,該怎麼做呢?
筆者在JavaMail API和郵件協議上都沒有找到好的解決辦法,有對郵件協議深刻了解的同學可以賜教,本文通過對內容分析,結構整理,“總結”出一套解決方案,但方案並不完美。
1. 原始內容用blockquote標籤包裹
實現即代碼中的remove1
2. 原始內容用includetail標籤包裹
實現即代碼中的remove2
3. 郵件初始內容是純文本不含有html標籤
實現即 代碼中的remove0
4. 通過與原始內容的連接點關鍵詞
如“發件人”分析連接點,去除後面的內容,實現即代碼中的remove3
完整代碼如下
public String getSimpleBodyText() {
if (this.bodyText != null) {
return remove(bodyText);
}
return bodyText;
}
public static String remove(final String content) {
String content0 = content;
content0 = remove1(content0);
content0 = remove2(content0);
if (content.equals(content0)) {
content0 = remove0(content0);
}
content0 = remove3(content0);
return content0;
}
public static String remove1(String content) {
int index1 = content.indexOf("<blockquote");
int index2 = content.lastIndexOf("blockquote>");
if (index1 != -1 && index2 != -1) {
logger.debug("remove1-blockquote:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2 + "blockquote>".length());
}
return content;
}
public static String remove0(String content) {
if (!content.trim().startsWith("<")) {
logger.debug("remove0:");
return content.substring(0, content.indexOf("<"));
}
return content;
}
public static String remove2(String content) {
int index1 = content.indexOf("<includetail");
int index2 = content.lastIndexOf("includetail>");
if (index1 != -1 && index2 != -1) {
logger.debug("remove2-includetail:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2 + "includetail>".length());
}
return content;
}
public static String remove3(String content) {
int index1 = -1;
int index2 = -1;
try {
Parser parser = new Parser(content);
NodeFilter pFilter = new TagNameFilter("div");
NodeList nodeList = parser.parse(pFilter);
SimpleNodeIterator elements = nodeList.elements();
while (elements.hasMoreNodes()) {
Node node = elements.nextNode();
String html = node.toHtml();
if (node.toString().contains("WordSection1")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (node.toString().contains("Section1")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (node.toString().contains("mailContentContainer")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (html.contains("發件人") || html.contains("From")) {
if (node.getStartPosition() > 0) {
index1 = node.getStartPosition();
if (index2 == -1) {
if (node.getParent() != null && node.getParent().getLastChild() != null) {
Node lastChild = node.getParent().getLastChild();
index2 = lastChild.getStartPosition() + lastChild.toHtml().length();
}
}
break;
}
}
}
} catch (ParserException e) {
e.printStackTrace();
}
if (index1 != -1 && index2 != -1) {
logger.debug("remove3-發件人/From:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2);
}
return content;
}