ElasticSearch 複合搜索與過濾API

本文基於Elasticsearch7.x

全文搜索在搜索時, 會對輸入的搜索文本進行分詞, 然後去倒排索引中進行匹配, 只要能匹配上任意一個關鍵詞(詞項), 就可以作爲結果返回.

在學習本篇博客前先了解下Elasticsearch全文搜索之基礎語法API

在這裏插入圖片描述

Rest API

添加搜索實例數據

POST /blogs/_bulk
{"index": {}}
{"post_date": "2020-01-01", "title": "Quick brown rabbits", "content": "Brown rabbits are commonly seen.", "author_id": 11401}
{"index": {}}
{"post_date": "2020-01-02", "title": "Keeping pets healthy", "content": "My quick brown fox eats rabbits on a regular basis.", "author_id": 11402}
{"index": {}}
{"post_date": "2020-01-03", "title": "My dog barks", "content": "I see a lot of barking dogs on the road.", "author_id": 11403}

bool

基礎匹配API的實例都是對一個搜索文本進行匹配, 即單條件搜索. 下面我們來學習下bool多條件搜索, 即由多個搜索文本構成的複合搜索.

(1) bool語法

  • must
    必須匹配, 貢獻算分.
  • must_not
    必須不能匹配, 貢獻算分.
  • should
    選擇性匹配, 貢獻算分.
  • filter
    必須匹配, 不貢獻算分.

must, must_not, should這三個條件是會用於相關度分數計算的, 而filter不會, 從而filter的性能會更好. 由以上四種搜索子句合併爲一條複合搜索語句, 這就是bool搜索.

基礎匹配API中講述的match, match_phrase, dis_max, multi_match, term是基礎的搜索語法, bool搜索是基於它們來實現的.

(2) 實例

a. 基礎使用

GET /blogs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "author_id": {
              "value": "11403"
            }
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "post_date": {
              "lte": "2020-01-02"
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "title.keyword": {
              "value": "My dog barks"
            }
          }
        },
        {
          "term": {
            "content.keyword": {
              "value": "barking dogs"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

b. 嵌套bool搜索

GET /blogs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "author_id": {
              "value": "11403"
            }
          }
        }
      ],
      "should": [
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "post_date": {
                    "value": "2020-01-02"
                  }
                }
              }
            ]
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

c. 排序與分頁

GET /blogs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "post_date": {
              "gte": "2020-01-01",
              "lte": "2020-01-03"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "author_id": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 2
}

filter

基礎匹配API中講述的match, match_phrase, dis_max, multi_match, term是基礎的搜索語法, filter過濾是基於它們來實現的. filter不計算相關度分數, 可以有效的利用緩存, 效率會更高.

(1) 語法

  • constant_score
  • bool

(2) 實例

a. constant_score語法

constant_score以固定的評分來執行搜索, 默認爲1.

GET /blogs/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "post_date": {
            "gte": "2020-01-01",
            "lte": "2020-01-03"
          }
        }
      }
    }
  },
  "sort": [
    {
      "author_id": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 2
}

b. bool語法

GET /blogs/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "post_date": "2020-01-03"
        }
      },
      "should": [
        {
          "term": {
            "title.keyword": {
              "value": "My dog barks"
            }
          }
        },
        {
          "term": {
            "content.keyword": {
              "value": "barking dogs"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Java API

下面介紹Elasticsearch Java Client 的使用, 我們來將上文的實例轉化爲 Java Client.

(1) main方法

public static void main(String[] args) throws IOException {
    RestHighLevelClient client = new RestHighLevelClient(
            RestClient.builder(
                    new HttpHost("localhost", 9200, "http")));

    bulkIndex(client);

    baseApi(client);

    boolNestApi(client);

    boolPaginationApi(client);

    constantScoreApi(client);

    boolFilterApi(client);

    client.close();
}

新增文檔和查詢文檔請求不要一起執行, 這樣會查不到文檔, 因爲新增文檔後需要1s時間進行倒排索引創建, 這也是ES被稱爲近實時的原因.

(2) 添加搜索數據

private static void bulkIndex(RestHighLevelClient client) throws IOException {
    BulkRequest bulkRequest = new BulkRequest();

    bulkRequest.add(new IndexRequest("blogs").id("1")
            .source(XContentType.JSON, "post_date", "2020-01-01", "title", "Quick brown rabbits", "content", "Brown rabbits are commonly seen.", "author_id", 11401));
    bulkRequest.add(new IndexRequest("blogs").id("2")
            .source(XContentType.JSON, "post_date", "2020-01-02", "title", "Keeping pets healthy", "content", "My quick brown fox eats rabbits on a regular basis.", "author_id", 11402));
    bulkRequest.add(new IndexRequest("blogs").id("3")
            .source(XContentType.JSON, "post_date", "2020-01-03", "title", "My dog barks", "content", "I see a lot of barking dogs on the road.", "author_id", 11403));

    client.bulk(bulkRequest, RequestOptions.DEFAULT);
}

(3) bool搜索基礎使用

private static void baseApi(RestHighLevelClient client) throws IOException {
    SearchRequest searchRequest = new SearchRequest("blogs");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    boolQueryBuilder.must(new TermQueryBuilder("author_id", "11403"));
    boolQueryBuilder.mustNot(new RangeQueryBuilder("post_date").lte("2020-01-02"));

    boolQueryBuilder.should(new TermQueryBuilder("title.keyword", "My dog barks"));
    boolQueryBuilder.should(new TermQueryBuilder("content.keyword", "barking dogs"));
    boolQueryBuilder.minimumShouldMatch(1);
    searchSourceBuilder.query(boolQueryBuilder);
    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hits = searchResponse.getHits().getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

(4) 嵌套bool搜索

private static void boolNestApi(RestHighLevelClient client) throws IOException {
    SearchRequest searchRequest = new SearchRequest("blogs");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    boolQueryBuilder.must(new TermQueryBuilder("author_id", "11403"));

    BoolQueryBuilder boolQueryBuilder2 = new BoolQueryBuilder();
    boolQueryBuilder2.mustNot(new TermQueryBuilder("post_date", "2020-01-02"));
    boolQueryBuilder.should(boolQueryBuilder2);

    boolQueryBuilder.minimumShouldMatch(1);
    searchSourceBuilder.query(boolQueryBuilder);
    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hits = searchResponse.getHits().getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

(5) 排序與分頁

private static void boolPaginationApi(RestHighLevelClient client) throws IOException {
    SearchRequest searchRequest = new SearchRequest("blogs");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    boolQueryBuilder.must(new RangeQueryBuilder("post_date").gte("2020-01-01").lte("2020-01-03"));
    searchSourceBuilder.query(boolQueryBuilder);
    searchSourceBuilder.sort("author_id", SortOrder.DESC);
    searchSourceBuilder.from(0);
    searchSourceBuilder.size(2);
    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hits = searchResponse.getHits().getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

(6) constant_score

private static void constantScoreApi(RestHighLevelClient client) throws IOException {
    SearchRequest searchRequest = new SearchRequest("blogs");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder("post_date").gte("2020-01-01").lte("2020-01-03");
    ConstantScoreQueryBuilder constantScoreQueryBuilder = new ConstantScoreQueryBuilder(rangeQueryBuilder);
    searchSourceBuilder.postFilter(constantScoreQueryBuilder);
    searchSourceBuilder.sort("author_id", SortOrder.DESC);
    searchSourceBuilder.from(0);
    searchSourceBuilder.size(2);

    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hits = searchResponse.getHits().getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}

(7) filter

private static void boolFilterApi(RestHighLevelClient client) throws IOException {
    SearchRequest searchRequest = new SearchRequest("blogs");

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
    RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder("post_date").gte("2020-01-01").lte("2020-01-03");
    boolQueryBuilder.filter(rangeQueryBuilder);

    TermsQueryBuilder termsQueryBuilder1 = new TermsQueryBuilder("title.keyword", "My dog barks");
    TermsQueryBuilder termsQueryBuilder2 = new TermsQueryBuilder("content.keyword", "barking dogs");
    boolQueryBuilder.should(termsQueryBuilder1);
    boolQueryBuilder.should(termsQueryBuilder2);
    boolQueryBuilder.minimumShouldMatch(1);

    searchSourceBuilder.query(boolQueryBuilder);
    searchRequest.source(searchSourceBuilder);

    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hits = searchResponse.getHits().getHits();
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章