Fielddata



When you sort on a field, Elasticsearch needs access to the value of that field for every document that matches the query. The inverted index, which performs very well when searching, is not the ideal structure for sorting on field values:

  • When searching, we need to be able to map a term to a list of documents.
  • When sorting, we need to map a document to its terms. In other words, we need to “uninvert” the inverted index.

To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata.

當按照某個字段排序的時候,Elasticsearch需要訪問匹配查詢的每一個文檔的該字段的值。反轉索引查詢的時候效率很高,但是不適合把字段值排序。

  • 當查詢的時候,我們需要把查詢詞映射到一個文檔列表。
  • 當排序的時候,我們需要把一個文檔映射到它的詞。換句話說,我們需要把反轉索引正過來。(這句話怎麼理解?反轉索引實際上是以詞爲中心,說明一個詞包含在哪些文檔中,查詢的時候是找詞,文檔什麼結構並不管。排序的時候,是給文檔排序。先要找出每一個文檔,再找到每個文檔的排序字段的值,再按照該值排序,在把排序結果應用於文檔,這個過程實際上跟數據庫是類似的)

爲了排序效率,Elasticsearch會把排序字段的所有詞加載到內存中。這就是fielddata。(應該不只是排序字段的值,應該還有該值與文檔之間的映射)


警告:

Elasticsearch doesn’t just load the values for the documents that matched a particular query. It loads the values from every document in your index, regardless of the document type.

Elasticsearch 不是加載匹配當前查詢的文檔的排序字段的值。它會加載索引中所有文檔的排序字段的值,而不考慮文檔的類型。


The reason that Elasticsearch loads all values into memory is that uninverting the index from disk is slow. Even though you may need the values for only a few docs for the current request, you will probably need access to the values for other docs on the next request, so it makes sense to load all the values into memory at once, and to keep them there.

Elasticsearch加載所有排序字段的值是因爲把反轉索引從磁盤順過來慢。即使當前請求只需要很少的文檔,但不久你可能需要訪問其它的排序字段值,所以加載所有索引字段值到內存中,並把它們保存在那兒是有道理的。


Fielddata is used in several places in Elasticsearch:

  • Sorting on a field
  • Aggregations on a field
  • Certain filters (for example, geolocation filters)
  • Scripts that refer to fields

Clearly, this can consume a lot of memory, especially for high-cardinality string fields—string fields that have many unique values—like the body of an email. Fortunately, insufficient memory is a problem that can be solved by horizontal scaling, by adding more nodes to your cluster.

For now, all you need to know is what fielddata is, and to be aware that it can be memory hungry. Later, we will show you how to determine the amount of memory that fielddata is using, how to limit the amount of memory that is available to it, and how to preload fielddata to improve the user experience.








發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章