製作Aspose CHM文檔的過程記錄

歡迎和大家交流技術相關問題：郵箱: [email protected] 博客園地址: http://www.cnblogs.com/jiangxinnju GitHub地址: https://github.com/jiangxincode 知乎地址: https://www.zhihu.com/people/jiangxinnju

最近公司需要使用Aspose組件開發相關內容，但是網上找不到理想的參考文檔，官網訪問速度也慢的可以。所以打算自己做份CHM文檔，做的過程中遇到很多困難，這裏記錄一下。第一步是在Aspose官網上把javadoc文檔爬取出來，我使用的工具是TeleportPro。爬取的網址是

http://www.aspose.com/api/java/pdf
http://www.aspose.com/api/java/cells

經過嘗試爬取深度設爲7最好。爬出來發現內容很多，有一個多G，而且有很多雜亂的內容，我們知道一般javadoc文檔只是html和css的組合，不需要js和各種圖片，所以僅保留了合適的目錄下的html文檔和api-reference-ui.css文件，其餘文件全部刪除。

但是這是發現由於刪除了一些文件，導致html文件中對api-reference-ui.css引用失效，於是用notepad++對引用路徑進行批量替換（../../../apireference.dynabic.com/doc/resources/css/api-reference-ui.css -> api-reference-ui.css），這時保證CSS文件能夠正常引用，但是用這些文件生成的chm文檔仍然很大，並且有一些無用的按鈕無法點擊，然後我們需要把它們幹掉。於是我寫了一個java程序，進行操作，需要最新的程序或者有不理解的可以聯繫我：

package edu.jiangxin.tools;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.util.ArrayList;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import edu.jiangxin.common.FileFilterWrapper;

public class RemoveHtmlElement {

    static final String charsetName = "UTF-8";
    static final String[] divClassNames = { "Header", "aspNetHidden", "Search", "clearAll", "Header" };
    static final String[] divIds = { "Header", "leftmenu" };

    public static void main(String[] args) throws IOException {
        ArrayList<File> files = new FileFilterWrapper().list("C:/asposebak", "htm");
        for (File file : files) {
            Document doc = Jsoup.parse(file, charsetName);
            for (int i = 0; i < divClassNames.length; i++) {
                Elements eles = doc.getElementsByClass(divClassNames[i]); // eles不可能爲null

                eles.remove();
            }
            for (int i = 0; i < divIds.length; i++) {
                Element ele = doc.getElementById(divIds[i]);
                if (ele != null) {
                    ele.remove();
                }

            }

            Elements eles = doc.getElementsByTag("script");
            for (int i = 0; i < eles.size(); i++) {
                Element ele = eles.get(i);
                if (ele.attr("language").equals("javascript") && ele.attr("type").equals("text/javascript")) {
                    ele.remove();
                }
            }

            FileOutputStream fos = new FileOutputStream(file, false);
            OutputStreamWriter osw = new OutputStreamWriter(fos, charsetName);
            osw.write(doc.html());
            osw.close();
            System.out.println(file.getAbsolutePath());
        }
    }

}

通過程序刪除之後基本解很清爽了，當然還需要使用notepad++進行一些簡單的文本批量替換。最後的工作就是使用easychm生成chm文檔了，我用的是試用版，感覺只不過多了廣告，生成的chm文檔並不影響使用。

製作Aspose CHM文檔的過程記錄

Java代碼質量度量工具大閱兵

如何合併兩個Git倉庫

如何發佈Maven依賴到中央倉庫

VirtualBox相關問題總結

製作Aspose CHM文檔的過程記錄

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結