Lucene中的近實時搜索SearcherManager

近實時搜索(near-real-time)可以搜索IndexWriter還未commit的內容。

Index索引的刷新過程:

只有IndexWriter上的commit操作纔會導致Ram Directory內存上的數據完全同步到文件。
IndexWriter提供了實時獲得reader的API,這個調用將會導致flush操作,生成新的segment,但不會commit (fsync),從而減少了IO。新的segment被加入到新生成的reader裏。從返回的reader中可以看到更新。
所以,只要每次新的搜索都從IndexWriter獲得一個新的reader,就可以搜索到最新的內容。這一操作的開銷僅僅是flush,相對commmit來說,開銷很小。

Lucene的index索引組織方式爲一個index目錄下的多個segment片段,新的doc會加入新的segment裏,這些新的小segment每間隔一段時間就會合並起來。因爲合併,總的sgement數量保持的較小,總體的search速度仍然很快。
爲了防止讀寫衝突,lucene只創建新的segment,並對任何active狀態的reader,不在使用後刪除老的segment。
flush就是把數據寫入操作系統的緩衝區,只要緩衝區不滿,就不會有硬盤操作。
commit是把所有內存緩衝區內的數據寫入到硬盤,是完全的硬盤操作,屬於重量級的操作。這是因爲Lucene索引中最主要的結構posting倒排通過VInt類型和delta的格式存儲並緊密排列。合併時要對同一個term的posting(倒排)進行歸併排序,是一個讀出,合併再生成的過程。

SearchManager近實時搜索 實現原理:

Lucene通過NRTManager這個類來實現近實時搜索,所謂近實時搜索也就是在索引發生改變時,通過線程跟蹤,在相對很短的時間內反映給用戶程序的 調用NRTManager通過管理IndexWriter對象,並將IndexWriter的一些方法進行增刪改,例如:addDocument,deleteDocument等方法暴漏給客戶調用,它的操作全部在內存裏面,所以如果你不調用IndexWriter的commit方法,通過以上的操作,用戶硬盤裏面的索引庫是不會變化的,所以你每次更新完索引庫請記得commit掉,這樣才能將變化的索引一起寫到硬盤中。

實現索引更新後的同步用戶每次獲取最新索引(IndexSearcher),可以通過兩種方式:

第一種是通過調用NRTManagerReopenThread對象,該線程負責實時跟蹤索引內存的變化,每次變化就調用maybeReopen方法,保持最新代索引,打開一個新的IndexSearcher對象,而用戶所要的IndexSearcher對象是NRTManager通過調用getSearcherManager方法獲得SearcherManager對象,然後通過SearcherManager對象獲取IndexSearcher對象返回個客戶使用,用戶使用完之後調用SearcherManager的release釋放IndexSearcher對象,最後記得關閉NRTManagerReopenThread;
第二種方式是不通過NRTManagerReopenThread對象,而是直接調用NRTManager的maybeReopen方法來獲取最新的IndexSearcher對象來獲取最新索引.

    public void testSearch() throws IOException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));
        SearcherManager sm = new SearcherManager(directory, null);
        IndexSearcher searcher = sm.acquire();
        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc doc : docs) {
            System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
            Document document = searcher.doc(doc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();
    }
    public void testUpdateAndSearch() throws IOException, InterruptedException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter writer = new IndexWriter(directory, config);
        TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
        SearcherManager sm = new SearcherManager(writer, true, null);
        ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
        thread.setDaemon(true);
        thread.setName("NRT Index Manager Thread");
        thread.start();

        Document doc = new Document();
        Field idField = new StringField("id", "3", Store.YES);
        Field titleField = new TextField("title", "test for 3", Store.YES);
        doc.add(idField);
        doc.add(titleField);
        long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
        // Thread.sleep(1000);
        // writer.close();
        // sm.maybeRefresh();
        // sm = new SearcherManager(writer, true, null);
        thread.waitForGeneration(gerenation);
        IndexSearcher searcher = sm.acquire();
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc scoreDoc : docs) {
            System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
            Document document = searcher.doc(scoreDoc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();

        // IndexSearcher searcher = sm.acquire();

        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        // Query query = new TermQuery(new Term("title", "test"));
        // TopDocs results = searcher.search(query, null, 100);
        // System.out.println(results.totalHits);
        // ScoreDoc[] docs = results.scoreDocs;
        // for (ScoreDoc doc : docs) {
        // System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
        // doc.score);
        // Document document = searcher.doc(doc.doc);
        // System.out.println("id:" + document.get("id") + " ,title:" +
        // document.get("title"));
        // }
        // sm.release(searcher);
    }

創建索引:

    public void testBulidIndex() throws IOException {
        Directory directory = FSDirectory.open(new File("/root/data/03"));
        // Directory directory=new RAMDirectory();
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE);
        IndexWriter writer = new IndexWriter(directory, config);
        Document doc1 = new Document();
        Field idField1 = new StringField("id", "1", Store.YES);
        Field titleField1 = new TextField("title", "test for 1", Store.YES);
        doc1.add(idField1);
        doc1.add(titleField1);
        writer.addDocument(doc1);

        Document doc2 = new Document();
        Field idField2 = new StringField("id", "2", Store.YES);
        Field titleField2 = new TextField("title", "test for 2", Store.YES);
        doc2.add(idField2);
        doc2.add(titleField2);
        writer.addDocument(doc2);

        writer.commit();
        writer.close();
    }
發佈了64 篇原創文章 · 獲贊 221 · 訪問量 43萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章