lucene 的操作主要分成 indexing 和 searching , 兩個操作也就完成了整個閉環操作,咱們先從這個indexing說起。
class IndexWriter 可以說是lucene暴露給上層應用的一個類。上層應用程序通過這個類打開lucene的索引世界。
通過了解這個類得成員變量來了解這個類到底是幹什麼的,有幾個比較重要的對象:
private final Directory directory; // where this index resides
private final Analyzer analyzer; // how to analyze text
private final DocumentsWriter docWriter;
private final MergeScheduler mergeScheduler;
private LinkedList<MergePolicy.OneMerge> pendingMerges = new LinkedList<MergePolicy.OneMerge>();
private Set<MergePolicy.OneMerge> runningMerges = new HashSet<MergePolicy.OneMerge>();
private List<MergePolicy.OneMerge> mergeExceptions = new ArrayList<MergePolicy.OneMerge>();
privatelongmergeGen;
privatebooleanstopMerges;
目錄,segment信息,段之間merge的策略,analyzer,還有負責真正寫的 DocumentWriter。
在構造函數中,基本做了以下幾件事情:
1. 加鎖
2. 加載配置
3. 初始化Flush策略(從RAM flush 到磁盤上)
4. 初始化DocumentWriter
5. 初始化IndexDeleter(用來最後刪除沒用的索引文件的,記錄每一個文件的引用計數)
DocumentWriter
IndexWriter通過調用DocumentWriter的方法,來操作索引。
每一個文檔傳給DocuentWriter中得DocConsumer , DocConsumer是整個搜索的核心,是indexing chain的源頭。
DocumentWriter 中有一個synchronized的方法getThreadState爲每一個線程分配一個ThreadState,然後就可以調用ThreadState中得方法,大多數heavy lifting 的任務在這個調用中,最後同步的synchronized finishDocument方法去flush change.
private final Directory directory;
private volatile boolean closed;
private final InfoStream infoStream;
private final LiveIndexWriterConfig config;
private final AtomicInteger numDocsInRAM = new AtomicInteger(0);
// TODO: cut over to BytesRefHash in BufferedDeletes
volatile DocumentsWriterDeleteQueue deleteQueue = new DocumentsWriterDeleteQueue();
private final DocumentsWriterFlushQueue ticketQueue = new DocumentsWriterFlushQueue();
/*
* we preserve changes during a full flush since IW might not checkout before
* we release all changes. NRT Readers otherwise suddenly return true from
* isCurrent while there are actually changes currently committed. See also
* #anyChanges() & #flushAllThreads
*/
private volatile boolean pendingChangesInCurrentFullFlush;
final DocumentsWriterPerThreadPool perThreadPool;
final FlushPolicy flushPolicy;
final DocumentsWriterFlushControl flushControl;
private final IndexWriter writer;
private final Queue<Event> events;
在構造函數zh中可以看到,他主要就是做一些策略的管理,管理DocumentsWriterPerThreadPool.
DocumentsWriterPerThread 對象創建了DocConsumer 即IndexChain(整個索引的核心),下一章會詳細講這件事情,同時
ThreadState 封裝了DocumentsWriterPerThread對象,同時擁有每一個線程需要flush的對象數據,他得每一個成員和方法必須在一個時刻只能一個線程訪問,調用者必須自己加鎖,解鎖。
DocumentsWriterPerThreadPool 控制indexing的時候 ThreadState的分配,每一個ThreadState存在對DocumentsWriterPerThread的一個引用,每一個線程必須獲取這麼一個ThreadState來進行indexing,
DocumentsWriterFlushControl 類來控制flush策略,記錄每一個DocumentsWriterPerThread內存消耗的量。