java 站內搜索lucene 分詞工具 IKAnalyzer，更新字典需要重啓服務的解決

原創

静艺

2020-02-21 14:49

IKAnalyzer 分詞可以自定義詞彙，jar包是2012版的。

問題：

1.在使用中發現詞彙的第一行，也就是第一個詞無法分詞，是IKAnalyzer的bug。沒有修改這個bug，第一行給個空行就行了。

2.發現新增詞彙後，必須重新啓動服務纔可以。

以下是解決問題的方式，沒有深研究問題解決了就行：

通過反編譯源碼發現，在 org.wltea.analyzer.dic 包下Dictionary 字典類中有個地方只加載一次。

只要singleton 給值爲null就可以了。

源碼：

http://download.csdn.net/detail/wangzhiqiang123456/9427001

public static Dictionary initial(Configuration cfg)
{
/* 83 */ if (singleton == null) {
/* 84 */ synchronized (Dictionary.class) {
/* 85 */ if (singleton == null) {
/* 86 */ singleton = new Dictionary(cfg);
/* 87 */ return singleton;
/* */ }
/* */ }
/* */ }

解決方式：在Dictionary 增加一個方法

public static void singletonrefresh()
{
singleton=null;
}

測試代碼：

import java.io.IOException;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.wltea.analyzer.dic.Dictionary;
import org.wltea.analyzer.lucene.IKAnalyzer;

public class TestAnalyzer {

public static void testIkAnalyzer(String testString) throws IOException{
Dictionary.singletonrefresh();
Analyzer ikanalyzer = new IKAnalyzer(true);
StringReader reader=new StringReader(testString);
TokenStream ts=ikanalyzer.tokenStream("", reader);
System.out.println("=====IKAnalyzer analyzer====");
System.out.println("分詞");
CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);
OffsetAttribute offAttr = ts.getAttribute(OffsetAttribute.class);
ts.reset();
while(ts.incrementToken()){
System.out.println(term+" ("+offAttr.startOffset()+","+offAttr.endOffset()+")");
}
}

public static void main(String[] args) throws Exception{
String testString = "海賊王景品巴里a股貓貼";

System.out.println(testString);
testIkAnalyzer(testString);

}

}

靜藝

發佈了99 篇原創文章 · 獲贊 12 · 訪問量 17萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

java 站內搜索lucene 分詞工具 IKAnalyzer，更新字典需要重啓服務的解決

記一次 .NET某工業設計軟件崩潰分析

創建 Vue3 項目

TS + Webpack 整合 Jest

分享5款.NET開源免費的Redis客戶端組件庫

安卓手機如何登錄抖音境外版

golang開發 gorilla websocket的使用

面試官：如果不允許線程池丟棄任務，應該選擇哪個拒絕策略？

嵌入式汽車電子學習路線

Mac卸載 Node npm，升級 Node

uni.showModel內容換行

springmvc 結合 ehcache

eclipse 開發jsp 智能提示設置

sprint boot 打包分離靜態文件

mybatis 分頁,當查詢行數爲0終止繼續的查詢

nginx nginx_concat_module 實現前端js和css合併請求

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結