Elasticsearch restAPI

操作索引

1.創建索引

PUT test
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 3
  }
}

2.創建_mapping和type

PUT test/_mapping/goods
{
  "properties":{
    "title":{
      "type":"text",
      "analyzer":"ik_max_word"
    },
    "images":{
      "type":"keyword",
      "index":"false"
    },
    "price":{
      "type":"float"
    }
  }
}

text支持分詞，keyword不支持，每個字段store(額外存儲一份)默認都爲false，因爲本身就存儲了一份，通過GET請求可以在source可以看到。

3.查看索引

GET test

可以看到

{
  "test": {
    "aliases": {},
    "mappings": {
      "goods": {
        "properties": {
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "float"
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1574692112859",
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "uuid": "IQHfSd6cR3W67Iijo5DJFg",
        "version": {
          "created": "6030099"
        },
        "provided_name": "test"
      }
    }
  }
}

4.刪除索引

DELETE test

新增數據

1.插入數據

POST /test/goods/
{
    "title":"小米手機",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2699.00
}

可以的到返回json

{
  "_index": "test",
  "_type": "goods",
  "_id": "faABo24BJOl0nN5bp2Uy",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

此時id值是自動生成的faABo24BJOl0nN5bp2Uy

2.自定義Id

POST /heima/goods/2
{
    "title":"大米手機",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2899.00
}

返回json

{
  "_index": "test",
  "_type": "goods",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

3.自動判斷類型生成_mapping

按照上面例子中的mapping

POST /test/goods/3
{
    "title":"大米手機",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":2899.00,
    "stock":200,
    "saleable":true,
    "testString":"測試"
}

新增了stock,saleable,testString,此時查看mapping

{
  "test": {
    "aliases": {},
    "mappings": {
      "goods": {
        "properties": {
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "float"
          },
          "saleable": {
            "type": "boolean"
          },
          "stock": {
            "type": "long"
          },
          "testString": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "title": {
            "type": "text",
            "analyzer": "ik_max_word"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1574692804519",
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "uuid": "PThJABTjRQ-WEfL5slB1TQ",
        "version": {
          "created": "6030099"
        },
        "provided_name": "test"
      }
    }
  }
}

修改數據

1.修改數據

PUT /heima/goods/3
{
    "title":"超大米手機",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":3899.00,
    "stock": 100,
    "saleable":true
}

返回json

{
  "_index": "test",
  "_type": "goods",
  "_id": "3",
  "_version": 2,
  "found": true,
  "_source": {
    "title": "超大米手機",
    "images": "http://image.leyou.com/12479122.jpg",
    "price": 3899,
    "stock": 100,
    "saleable": true
  }
}

刪除數據

DELETE test/goods/3

基本查詢

1.查詢所有的（match_all）

GET /test/_search
{
    "query":{
        "match_all": {}
    }
}

返回結果json

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "goods",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "大米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      },
      {
        "_index": "test",
        "_type": "goods",
        "_id": "faABo24BJOl0nN5bp2Uy",
        "_score": 1,
        "_source": {
          "title": "小米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      }
    ]
  }
}

返回的json字段解釋

- took：查詢花費時間，單位是毫秒
- time_out：是否超時
- _shards：分片信息
- hits：搜索結果總覽對象
  - total：搜索到的總條數
  - max_score：所有結果中文檔得分的最高分
  - hits：搜索結果的文檔對象數組，每個元素是一條搜索到的文檔信息
    - _index：索引庫
    - _type：文檔類型
    - _id：文檔id
    - _score：文檔得分
    - _source：文檔的源數據

2.匹配查詢

先查詢結果

 "_index": "test",
        "_type": "goods",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "大米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      },
      {
        "_index": "test",
        "_type": "goods",
        "_id": "faABo24BJOl0nN5bp2Uy",
        "_score": 1,
        "_source": {
          "title": "小米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      },
      {
        "_index": "test",
        "_type": "goods",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "小米電視4A",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899
        }
      }

or關係查詢

match類型查詢，會把查詢條件進行分詞，然後進行查詢,多個詞條之間是or的關係

GET /test/_search
{
    "query":{
        "match":{
            "title":"小米電視"
        }
    }
}

命中

"hits": [
      {
        "_index": "test",
        "_type": "goods",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "小米電視4A",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899
        }
      },
      {
        "_index": "test",
        "_type": "goods",
        "_id": "faABo24BJOl0nN5bp2Uy",
        "_score": 0.2876821,
        "_source": {
          "title": "小米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      }

查詢字段把小米電視切分成了小米和電視，之間爲or的關係

and關係

GET test/goods/_search
{
  "query": {
    "match": {
      "title": {
        "query": "小米電視",
        "operator": "and"
      }
      
    }
  }
}

命中了

 {
        "_index": "test",
        "_type": "goods",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "小米電視4A",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3899
        }
      }

最小匹配(minimum_should_match)

GET /test/_search
{
    "query":{
        "match":{
            "title":{
            	"query":"小米曲面電視",
            	"minimum_should_match": "75%"
            }
        }
    }
}

3.多字段查詢

GET /test/_search
{
    "query":{
        "multi_match": {
            "query":    "小米",
            "fields":   [ "title", "subTitle" ]
        }
	}
}

4.詞條匹配精確查詢(term terms)

term 查詢被用於精確值匹配，這些精確值可能是數字、時間、布爾或者那些未分詞的字符串

GET /test/_search
{
    "query":{
        "term":{
            "price":2699.00
        }
    }
}

GET /test/_search
{
    "query":{
        "terms":{
            "price":[2699.00,2899.00,3899.00]
        }
    }
}

結果過濾

GET /test/_search
{
  "_source": ["title","price"],
  "query": {
    "term": {
      "price": 2699
    }
  }
}

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "goods",
        "_id": "faABo24BJOl0nN5bp2Uy",
        "_score": 1,
        "_source": {
          "price": 2699,
          "title": "小米手機"
        }
      }
    ]
  }
}

source裏只有 price 和title

我們也可以通過：

includes：來指定想要顯示的字段
excludes：來指定不想要顯示的字段

GET /test/_search
{
  "_source": {
    "includes":["title","price"]
  },
  "query": {
    "term": {
      "price": 2699
    }
  }
}

高級查詢

1.布爾組合

must（與）、must_not（非）、should（或）

GET /test/_search
{
    "query":{
        "bool":{
        	"must":     { "match": { "title": "小米" }},
        	"must_not": { "match": { "title":  "電視" }},
        	"should":   { "match": { "title": "手機" }}
        }
    }
}

結果爲

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test",
        "_type": "goods",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "大米手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      }
    ]
  }
}

小米電視就沒匹配出來

2.範圍查詢（range）

GET test/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 2699,
        "lte": 3000
      }
    }
  }
}

查詢價格在 2699-3000的，大於等於2699，小於等於3000

操作符	說明
gt	大於
gte	大於等於
lt	小於
lte	小於等於

3.模糊查詢(fuzzy)

新增數據

POST /test/goods/4
{
    "title":"apple手機",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":6899.00
}

GET test/goods/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "appla"
      }
    }
  }
}

fuzzy 查詢是 term 查詢的模糊等價。它允許用戶搜索詞條與實際詞條的拼寫出現偏差，但是偏差的編輯距離不得超過2
匹配的到

 {
        "_index": "test",
        "_type": "goods",
        "_id": "4",
        "_score": 0.55451775,
        "_source": {
          "title": "apple手機",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 6899
        }

我們可以通過fuzziness來指定允許的編輯距離

GET test/goods/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "apalc",
        "fuzziness": 2
      }
    }
  }
}

fuzziness最大不能超過2

排序

1.單字段排序

GET /test/_search
{
  "query": {
    "match": {
      "title": "小米手機"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

2.多字段排序

GET /goods/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "小米手機" }},
        	"filter":{
                "range":{"price":{"gt":200000,"lt":300000}}
        	}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

集合aggregations

Elasticsearch中的聚合，包含多種類型，最常用的兩種，一個叫桶，一個叫度量：
Elasticsearch中提供的劃分桶的方式有很多：

Date Histogram Aggregation：根據日期階梯分組，例如給定階梯爲周，會自動每週分爲一組
Histogram Aggregation：根據數值階梯分組，與日期類似
Terms Aggregation：根據詞條內容分組，詞條內容完全匹配的爲一組
Range Aggregation：數值和日期的範圍分組，指定開始和結束，然後按段分組
……
bucket aggregations 只負責對數據進行分組，並不進行計算，因此往往bucket中往往會嵌套另一種聚合：metrics aggregations即度量

常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同時返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前幾
Value Count Aggregation：求總數

在ES中，聚合排序過濾的字段不能被分詞處理，所以文字類型得設置keyword

爲了測試，導入數據
創建索引

PUT /cars
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "transactions": {
      "properties": {
        "color": {
          "type": "keyword"
        },
        "make": {
          "type": "keyword"
        }
      }
    }
  }
}

導入數據

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

1.聚合爲桶

首先，我們按照汽車的顏色color來劃分桶

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

size：查詢條數，這裏設置爲0，因爲我們不關心搜索到的數據，只關心聚合結果，提高效率
aggs：聲明這是一個聚合查詢，是aggregations的縮寫
- popular_colors：給這次聚合起一個名字，任意。
  - terms：劃分桶的方式，這裏是根據詞條劃分
    - field：劃分桶的字段

結果

   {
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4
        },
        {
          "key": "blue",
          "doc_count": 2
        },
        {
          "key": "green",
          "doc_count": 2
        }
      ]
    }
  }
}

hits：查詢結果爲空，因爲我們設置了size爲0
aggregations：聚合的結果
popular_colors：我們定義的聚合名稱
buckets：查找到的桶，每個不同的color字段值都會形成一個桶
- key：這個桶對應的color字段的值
- doc_count：這個桶中的文檔數量

2.桶內度量

以剛纔分好的桶進行價格平均值的度量

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                }
            }
        }
    }
}

aggs：我們在上一個aggs(popular_colors)中添加新的aggs。可見度量也是一個聚合
avg_price：聚合的名稱
avg：度量的類型，這裏是求平均值
field：度量運算的字段

結果

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4,
          "avg_price": {
            "value": 32500
          }
        },
        {
          "key": "blue",
          "doc_count": 2,
          "avg_price": {
            "value": 20000
          }
        },
        {
          "key": "green",
          "doc_count": 2,
          "avg_price": {
            "value": 21000
          }
        }
      ]
    }
  }
}

3.桶內嵌套桶

剛纔的案例中，我們可以想到把avg計算平均值的是不是可以換成terms,不是進行度量計算，而是再分組，事實上是可以的，事實上桶不僅可以嵌套運算，還可以再嵌套其它桶。也就是說在每個分組中，再分更多組。

比如：我們想統計每種顏色的汽車中，分別屬於哪個製造商，按照make字段再進行分桶

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                },
                "maker":{
                    "terms":{
                        "field":"make"
                    }
                }
            }
        }
    }
}

原來的color桶和avg計算我們不變
maker：在嵌套的aggs下新添一個桶，叫做maker
terms：桶的劃分類型依然是詞條
filed：這裏根據make字段進行劃分

部分結果
...
{"aggregations": {
    "popular_colors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "red",
          "doc_count": 4,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "honda",
                "doc_count": 3
              },
              {
                "key": "bmw",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 32500
          }
        },
        {
          "key": "blue",
          "doc_count": 2,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "ford",
                "doc_count": 1
              },
              {
                "key": "toyota",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 20000
          }
        },
        {
          "key": "green",
          "doc_count": 2,
          "maker": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "ford",
                "doc_count": 1
              },
              {
                "key": "toyota",
                "doc_count": 1
              }
            ]
          },
          "avg_price": {
            "value": 21000
          }
        }
      ]
    }
  }
}
...

我們可以看到，新的聚合maker被嵌套在原來每一個color的桶中。
每個顏色下面都根據 make字段進行了分組
我們能讀取到的信息：
- 紅色車共有4輛
- 紅色車的平均售價是 $32，500 美元。
- 其中3輛是 Honda 本田製造，1輛是 BMW 寶馬製造。

4.階梯分桶Histogram

如果你有價格字段，你設定interval 值爲200，那麼階梯就爲
0，200，400，600

(瞭解就好)
如果一件商品的價格是450，會落入哪個階梯區間呢？計算公式如下：

    bucket_key = Math.floor((value - offset) / interval) * interval + offset
    

value：就是當前數據的值，本例中是450

offset：起始偏移量，默認爲0

interval：階梯間隔，比如200

因此你得到的key = Math.floor((450 - 0) / 200) * 200 + 0 = 400

我們對汽車的價格進行分組，指定間隔interval爲5000

GET /cars/_search
{
  "size":0,
  "aggs":{
    "price":{
      "histogram": {
        "field": "price",
        "interval": 5000
      }
    }
  }
}

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 2
        },
        {
          "key": 15000,
          "doc_count": 1
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 25000,
          "doc_count": 1
        },
        {
          "key": 30000,
          "doc_count": 1
        },
        {
          "key": 35000,
          "doc_count": 0
        },
        {
          "key": 40000,
          "doc_count": 0
        },
        {
          "key": 45000,
          "doc_count": 0
        },
        {
          "key": 50000,
          "doc_count": 0
        },
        {
          "key": 55000,
          "doc_count": 0
        },
        {
          "key": 60000,
          "doc_count": 0
        },
        {
          "key": 65000,
          "doc_count": 0
        },
        {
          "key": 70000,
          "doc_count": 0
        },
        {
          "key": 75000,
          "doc_count": 0
        },
        {
          "key": 80000,
          "doc_count": 1
        }
      ]
    }
  }
}

我們可以增加一個參數min_doc_count爲1，來約束最少文檔數量爲1，這樣文檔數量爲0的桶會被過濾

GET /cars/_search
{
  "size":0,
  "aggs":{
    "price":{
      "histogram": {
        "field": "price",
        "interval": 5000,
        "min_doc_count": 1
      }
    }
  }
}

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 2
        },
        {
          "key": 15000,
          "doc_count": 1
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 25000,
          "doc_count": 1
        },
        {
          "key": 30000,
          "doc_count": 1
        },
        {
          "key": 80000,
          "doc_count": 1
        }
      ]
    }
  }
}

5.範圍分桶range

範圍分桶與階梯分桶類似，也是把數字按照階段進行分組，只不過range方式需要你自己指定每一組的起始和結束大小。

GET cars/_search
{
  "size": 0,
  "aggs": {
    "rangAggs": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 10000,
            "to": 20000
          },
          {
            "from": 20000,
            "to": 40000
          }
        ]
      }
      
    }
  }
}

Elasticsearch restAPI

文章目錄

操作索引

1.創建索引

2.創建_mapping和type

3.查看索引

4.刪除索引

新增數據

1.插入數據

2.自定義Id

3.自動判斷類型生成_mapping

修改數據

1.修改數據

刪除數據

基本查詢

1.查詢所有的（match_all）

返回的json字段解釋

2.匹配查詢

or關係查詢

and關係

最小匹配(minimum_should_match)

3.多字段查詢

4.詞條匹配精確查詢(term terms)

結果過濾

高級查詢

1.布爾組合

2.範圍查詢（range）

3.模糊查詢(fuzzy)

排序

1.單字段排序

2.多字段排序

集合aggregations

1.聚合爲桶

2.桶內度量

3.桶內嵌套桶

4.階梯分桶Histogram

5.範圍分桶range