近實時搜索(near-real-time)可以搜索IndexWriter還未commit的內容。
Index索引的刷新過程:
只有IndexWriter上的commit操作纔會導致Ram Directory內存上的數據完全同步到文件。
IndexWriter提供了實時獲得reader的API,這個調用將會導致flush操作,生成新的segment,但不會commit (fsync),從而減少了IO。新的segment被加入到新生成的reader裏。從返回的reader中可以看到更新。
所以,只要每次新的搜索都從IndexWriter獲得一個新的reader,就可以搜索到最新的內容。這一操作的開銷僅僅是flush,相對commmit來說,開銷很小。
Lucene的index索引組織方式爲一個index目錄下的多個segment片段,新的doc會加入新的segment裏,這些新的小segment每間隔一段時間就會合並起來。因爲合併,總的sgement數量保持的較小,總體的search速度仍然很快。
爲了防止讀寫衝突,lucene只創建新的segment,並對任何active狀態的reader,不在使用後刪除老的segment。
flush就是把數據寫入操作系統的緩衝區,只要緩衝區不滿,就不會有硬盤操作。
commit是把所有內存緩衝區內的數據寫入到硬盤,是完全的硬盤操作,屬於重量級的操作。這是因爲Lucene索引中最主要的結構posting倒排通過VInt類型和delta的格式存儲並緊密排列。合併時要對同一個term的posting(倒排)進行歸併排序,是一個讀出,合併再生成的過程。
SearchManager近實時搜索 實現原理:
Lucene通過NRTManager這個類來實現近實時搜索,所謂近實時搜索也就是在索引發生改變時,通過線程跟蹤,在相對很短的時間內反映給用戶程序的 調用NRTManager通過管理IndexWriter對象,並將IndexWriter的一些方法進行增刪改,例如:addDocument,deleteDocument等方法暴漏給客戶調用,它的操作全部在內存裏面,所以如果你不調用IndexWriter的commit方法,通過以上的操作,用戶硬盤裏面的索引庫是不會變化的,所以你每次更新完索引庫請記得commit掉,這樣才能將變化的索引一起寫到硬盤中。
實現索引更新後的同步用戶每次獲取最新索引(IndexSearcher),可以通過兩種方式:
第一種是通過調用NRTManagerReopenThread對象,該線程負責實時跟蹤索引內存的變化,每次變化就調用maybeReopen方法,保持最新代索引,打開一個新的IndexSearcher對象,而用戶所要的IndexSearcher對象是NRTManager通過調用getSearcherManager方法獲得SearcherManager對象,然後通過SearcherManager對象獲取IndexSearcher對象返回個客戶使用,用戶使用完之後調用SearcherManager的release釋放IndexSearcher對象,最後記得關閉NRTManagerReopenThread;
第二種方式是不通過NRTManagerReopenThread對象,而是直接調用NRTManager的maybeReopen方法來獲取最新的IndexSearcher對象來獲取最新索引.
public void testSearch() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
SearcherManager sm = new SearcherManager(directory, null);
IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc doc : docs) {
System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
Document document = searcher.doc(doc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
}
public void testUpdateAndSearch() throws IOException, InterruptedException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(directory, config);
TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
SearcherManager sm = new SearcherManager(writer, true, null);
ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
thread.setDaemon(true);
thread.setName("NRT Index Manager Thread");
thread.start();
Document doc = new Document();
Field idField = new StringField("id", "3", Store.YES);
Field titleField = new TextField("title", "test for 3", Store.YES);
doc.add(idField);
doc.add(titleField);
long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
// Thread.sleep(1000);
// writer.close();
// sm.maybeRefresh();
// sm = new SearcherManager(writer, true, null);
thread.waitForGeneration(gerenation);
IndexSearcher searcher = sm.acquire();
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc scoreDoc : docs) {
System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
Document document = searcher.doc(scoreDoc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
// IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
// Query query = new TermQuery(new Term("title", "test"));
// TopDocs results = searcher.search(query, null, 100);
// System.out.println(results.totalHits);
// ScoreDoc[] docs = results.scoreDocs;
// for (ScoreDoc doc : docs) {
// System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
// doc.score);
// Document document = searcher.doc(doc.doc);
// System.out.println("id:" + document.get("id") + " ,title:" +
// document.get("title"));
// }
// sm.release(searcher);
}
創建索引:
public void testBulidIndex() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
// Directory directory=new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(directory, config);
Document doc1 = new Document();
Field idField1 = new StringField("id", "1", Store.YES);
Field titleField1 = new TextField("title", "test for 1", Store.YES);
doc1.add(idField1);
doc1.add(titleField1);
writer.addDocument(doc1);
Document doc2 = new Document();
Field idField2 = new StringField("id", "2", Store.YES);
Field titleField2 = new TextField("title", "test for 2", Store.YES);
doc2.add(idField2);
doc2.add(titleField2);
writer.addDocument(doc2);
writer.commit();
writer.close();
}