ElasticSearch安裝與基礎使用入門

關於ElasticSearch是什麼，可以參考ES的官方文檔中的介紹：Elasticsearch Introduction，中文版請見：ElasticSearch功能簡介和系統介紹。本文針對瞭解ES可以做什麼之後，來介紹如何安裝ES以及使用ES進行一些基本操作（使用REST APIs進行數據存儲、搜索和分析），作爲ES的入門內容。

本文的內容主要包括如下步驟：

搭建ElasticSearch本地集羣環境並運行ES
ES中導入示例數據（單個文件/批量導入）
使用ElasticSearch Query Language搜索數據
使用Bucket和Metrics Aggregations（聚合）進行結果分析

本文內容參考自ElasticSearch官方文檔：Getting started with Elasticsearch。

ElasticSearch環境搭建並運行

使用Elastic Cloud運行ES

可以使用Elastic Cloud來構建ES的運行環境，在Elasticsearch Service上創建部署時，該服務與Kibana和APM一起預配一個三節點Elasticsearch集羣。但目前阿里雲，AWS和騰訊雲提供的ES服務均爲收費版本，因此這裏不採用這種ES環境構建方式。如果需要使用Elastis Cloud，創建的方式具體見：Run Elasticsearch on Elastic Cloud。

使用這種方式創建ES運行環境後，就無需進行下面的本地環境搭建過程，直接可以進行數據的導入和開發過程。

Elasticsearch本地運行環境構建

在Elasticsearch Service上創建部署時，將自動設置一個主節點和兩個數據節點。通過下面tar或zip安裝，則可以在本地啓動Elasticsearch的多個實例，以構建多節點集羣的環境。下面進行本地三個節點ES運行環境的安裝過程。

構建本地運行環境過程

1. 根據自己的OS環境下載對應的安裝文件：

Linux: elasticsearch-7.6.2-linux-x86_64.tar.gz

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz

macOS: elasticsearch-7.6.2-darwin-x86_64.tar.gz

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-darwin-x86_64.tar.gz

Windows: elasticsearch-7.6.2-windows-x86_64.zip

2. 解壓安裝文件：

Linux：

tar -xvf elasticsearch-7.6.2-linux-x86_64.tar.gz

macOS:

tar -xvf elasticsearch-7.6.2-darwin-x86_64.tar.gz

Windows PowerShell:

Expand-Archive elasticsearch-7.6.2-windows-x86_64.zip

3. 從解壓目錄bin中啓動Elasticsearch：

Linux and macOS:

cd elasticsearch-7.6.2/bin
./elasticsearch

Windows:

cd elasticsearch-7.6.2\bin
.\elasticsearch.bat

啓動完成後，現在就有一個ES的單實例成功運行了。

4. 下面啓動另外兩個Elasticsearch實例，來構建典型的多節點集羣環境，此時在啓動多個ES實例時，需要爲每個節點指定唯一的數據和日誌路徑。

Linux and macOS：

./elasticsearch -Epath.data=data2 -Epath.logs=log2
./elasticsearch -Epath.data=data3 -Epath.logs=log3

Windows:

.\elasticsearch.bat -E path.data=data2 -E path.logs=log2
.\elasticsearch.bat -E path.data=data3 -E path.logs=log3

上面爲新增的其他節點分配了唯一的ID，此時由於所有三個節點均爲本地一個主機中運行，因此它們會自動與第一個節點一起加入到集羣中。

使用Cat Health API驗證集羣運行狀態

完成上述的安裝和運行過程後，可以使用ElasticSearch提供的cat health API驗證三節點集羣是否正在運行。 cat API以比原始JSON更易於閱讀的格式返回有關集羣和索引的信息。可以通過向Elasticsearch REST API提交HTTP請求來直接與集羣交互。如果已安裝並正在運行Kibana，則也可以打開Kibana並通過開發控制檯提交請求。

運行下面的命令來啓動Cat Health API：

curl -X GET "localhost:9200/_cat/health?v&pretty"

返回的響應信息如下，其中elasticsearch green表明了Elasticsearch集羣的狀態爲綠色，並且具有三個節點：

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1565052807 00:53:27  elasticsearch green           3         3      6   3    0    0        0             0                  -                100.0%

說明：如果僅運行單個Elasticsearch實例，則集羣狀態爲黃色。單節點羣集雖然具有完整的功能，但是無法將數據複製到另一個節點以提供彈性。副本分片在集羣環境中可用，此時ES狀態爲綠色。如果集羣狀態爲紅色，則表明某些數據不可用。

使用cURL向ElasticSearch提交請求

下面使用cURL命令，向本地ElasticSearch實例發送請求，進行相關的操作。Elasticsearch的請求與HTTP請求結構類似，包含如下組成部分：

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'

其中具有的可用變量如下：

<VERB>
    The appropriate HTTP method or verb. For example, GET, POST, PUT, HEAD, or DELETE.
<PROTOCOL>
    Either http or https. Use the latter if you have an HTTPS proxy in front of Elasticsearch or you use Elasticsearch security features to encrypt HTTP communications.
<HOST>
    The hostname of any node in your Elasticsearch cluster. Alternatively, use localhost for a node on your local machine.
<PORT>
    The port running the Elasticsearch HTTP service, which defaults to 9200.
<PATH>
    The API endpoint, which can contain multiple components, such as _cluster/stats or _nodes/stats/jvm.
<QUERY_STRING>
    Any optional query-string parameters. For example, ?pretty will pretty-print the JSON response to make it easier to read.
<BODY>
    A JSON-encoded request body (if necessary).

如果啓用了Elasticsearch安全功能，則還必須提供有權運行API的有效用戶名（和密碼）。例如，使用-u或--u cURL命令參數。有關運行每個API所需的安全特權的詳細信息，請參閱REST APIs。

Elasticsearch使用HTTP狀態代碼（例如200 OK）響應每個API請求。除了HEAD請求之外，它還返回一個JSON編碼的響應主體。

其他ES安裝選項

其他具體的ElasticSearch安裝選項和配置見：Installing Elasticsearch。

ES中導入示例數據

完成上述的集羣安裝和啓動後，就可以向ES導入一些數據並建立索引（index）了。 Elasticsearch有多種數據導入選項，但最終都是使用相同的方式：使用JSON將數據導入Elasticsearch索引中。

單個索引文件提交

可以使用簡單的PUT請求直接執行數據導入操作，該請求指定要添加文檔的索引、唯一的文檔ID、以及請求body中的一個或多個“filed”：“value” 對：

PUT /customer/_doc/1
{
  "name": "John Doe"
}

customer爲ES的index
_doc爲索引的類型：文檔類型
1：爲該索引創建文檔的ID

以上寫法是ElasticSearch提供的cURl簡寫表達方式，沒有加上完整的curl命令格式，完整的命令如下：

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d
'
{
  "name": "John Doe"
}
'

說明：下面的命令主要使用簡寫的命令方式，完整的命令可以根據該示例進行拼接使用。

如果該索引尚不存在，此請求將自動創建該索引，添加ID爲1的新文檔，並存儲name字段併爲其建立索引。由於這是一個新文檔，因此該請求的響應結果顯示該操作創建了該文檔的版本1：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 26,
  "_primary_term" : 4
}

創建完成後，可以從集羣中的任何節點使用新文檔，使用文檔ID的GET請求檢索該數據：

GET /customer/_doc/1

-- curl
curl -X GET "localhost:9200/customer/_doc/1?pretty"

得到的響應如下，表明找到了具有指定ID的文檔，並顯示了已索引的源字段：

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 4,
  "found" : true,
  "_source" : {
    "name": "John Doe"
  }
}

使用Bulk批量提交索引文件

如果有很多要索引的文檔，則可以使用bulk API批量提交數據到ES。使用批量處理文檔操作比單獨提交請求要快得多，因爲它可以最大程度地減少網絡往返次數。

最佳批處理大小取決於許多因素：文檔大小和複雜性，索引的建立和搜索負載以及集羣的可用資源情況。一般的建議是一次批處理1,000至5,000個文檔，總的有效文件大小在5MB至15MB之間。

要將多個數據批量導入Elasticsearch，可以進行如下過程：

1. 下載 accounts.json 示例數據集。此隨機生成的數據集中的文檔代表具有以下信息的用戶帳戶：

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "[email protected]",
    "city": "Hobucken",
    "state": "CO"
}

2. 然後通過以下_bulk請求將帳戶數據索引到bank索引中：

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

4. 如下的返回信息表示響應表明1,000個文檔被成功索引：

health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128.6kb        128.6kb

使用ElasticSearch Query Language搜索數據

將一些數據導入到Elasticsearch索引後，就可以通過將請求發送到_search端點來進行搜索。如果需要使用全面的搜索功能，可以使用Elasticsearch Query DSL在請求body中指定搜索條件。並可以在請求URI中指定要搜索的索引名稱。

例如，以下請求將檢索bank索引中的所有文檔並按帳號排序：

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d '
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}
'

默認情況下，返回的響應內容中，``hits''部分包括符合搜索條件的前10個文檔：

{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
        "value": 1000,
        "relation": "eq"
    },
    "max_score" : null,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],
      "_score" : null,
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"[email protected]","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

該響應內容中還提供有關搜索請求的以下信息：

took – how long it took Elasticsearch to run the query, in milliseconds
timed_out – whether or not the search request timed out
_shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
max_score – the score of the most relevant document found
hits.total.value - how many matching documents were found
hits.sort - the document’s sort position (when not sorting by relevance score)
hits._score - the document’s relevance score (not applicable when using match_all)

每個搜索請求都是獨立的：Elasticsearch在請求中不維護任何狀態信息。如果要瀏覽搜索結果，則需要在請求中指定from和size參數。例如，以下命令獲取返回數據的第10個數據開始，到19個數據（包含的size爲10）：

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
'

現在，上面已經瞭解瞭如何提交基本的搜索請求，下面開始構建比match_all更有趣的查詢。要在字段中搜索特定術語，可以使用匹配（match）查詢。例如，以下請求在``地址''字段中搜索以查找其地址包含``mill''或``lane''的客戶：

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "address": "mill lane" } }
}
'

如果要執行短語搜索而不是匹配單個術語，請使用match_phrase而不是match。例如，以下請求僅匹配包含短語“ mill lane”的地址：

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

要構造更復雜的查詢，可以使用布爾查詢來組合多個查詢條件，根據需要（must match），期望（should match）或不期望（must not match）指定條件。例如，以下請求在bank索引中搜索屬於40歲客戶的賬戶，但不包括居住在愛達荷州（ID）的任何人：

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

Boolean查詢中的每個must，should和must_not元素被稱爲查詢子句。文檔滿足每個must和should子句條件的程度決定了每個文檔的相關性評分（relevance score）。分數越高，文檔就越符合期望的搜索條件。默認情況下，Elasticsearch返回按這些相關性分數排名的文檔。

“must_not”子句中的條件被視爲過濾器。它會影響文檔是否包含在結果中，但不會影響文檔的評分方式。也可以顯式的指定任意過濾器，以根據結構化的數據包括或排除文檔。

例如，以下請求使用範圍過濾器將結果限制爲餘額在20,000美元到30,000美元（含）之間的帳戶。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

使用Bucket和Metrics Aggregations（聚合）進行結果分析

Elasticsearch的聚合功能能夠獲取有關搜索結果的元信息，並回答諸如“德克薩斯州有多少個帳戶持有人”之類的問題，或“田納西州的平均帳戶餘額是多少？” 使用aggregation功能可以在一個請求中搜索文檔，過濾命中並使用匯總分析結果。

例如，以下請求按state使用terms對bank索引的所有賬戶進行分組，並按降序返回帳戶數量最多的十個州：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

得到的請求響應如下，響應中的buckts是state字段的值，doc_count顯示每個state下的帳戶數。例如，可以看到ID（Idaho）中有27個帳戶。由於請求的size= 0，因此返回的結果僅包含聚合的結果。

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
     "total" : {
        "value": 1000,
        "relation": "eq"
     },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

也可以組合聚合以構建更復雜的數據彙總。例如，以下請求在上一個group_by_state聚合中嵌套一個平均（avg）聚合，以計算每個州的平均賬戶餘額。

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

可以通過在terms聚合內指定順序來使用嵌套聚合的結果進行排序，而不是按計數（count值）對結果進行排序：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

-- curl
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

除了以上基本的bucket和metrics聚合外，Elasticsearch還提供了專門的聚合，用於在多個字段上操作並分析特定類型的數據，例如日期，IP地址和地理數據。還可以將單個聚合的結果發送到pipeline aggregations中，以進行進一步分析。

此外，聚合提供的核心分析功能中具有其他高級功能，例如使用機器學習來檢測異常等。

ElasticSearch安裝與基礎使用入門

ElasticSearch安裝與基礎使用入門

ElasticSearch環境搭建並運行

使用Elastic Cloud運行ES

Elasticsearch本地運行環境構建

構建本地運行環境過程

使用Cat Health API驗證集羣運行狀態

使用cURL向ElasticSearch提交請求

其他ES安裝選項

ES中導入示例數據

單個索引文件提交

使用Bulk批量提交索引文件

使用ElasticSearch Query Language搜索數據

使用Bucket和Metrics Aggregations（聚合）進行結果分析

MaxCompute SQL Script執行時腳本未綁定項目問題解決

圖書館數據庫資源訪問方法

Mac中Word 2016導出PDF附帶書籤目錄結構

AJAX跨域問題解決方案

新一代流式計算平臺Apache Heron專題【更新】

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結