我的ElasticSearch使用筆記

最新的整理在這裏 https://github.com/vector4wang/elasticsearch-quick

以下基於Elastic 5.4版本

部署

這裏使用Docker部署

獲取鏡像docker pull elasticsearch:5.4
啓動 docker run -d -p 9200:9200 -p 9100:9100 elasticsearch:5.4

注意: 通過docker ps可以看到es的啓動情況，如果沒有成功可以通過docker logs elasticsearch:5.4 查看日誌，一般會報這個錯誤Cannot allocate memory，此時加上-e ES_JAVA_OPTS="-Xms512m -Xmx512m",全命令即

docker run -d -p 9200:9200 -p 9100:9100 -e ES_JAVA_OPTS="-Xms512m -Xmx512m" elasticsearch:5.4

ES 的相關使用

查看狀態

GET _cat/health?v

查看節點列表

GET _cat/nodes?v

創建索引

PUT index?pretty

查看索引列表

GET _cat/indices?v

刪除索引

DELETE index?pretty

創建Mapping

POST index/type/_mapping

{
  "student": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "jieba_index",
        "search_analyzer": "jieba_search"
      },
      "class": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "sex": {
        "type": "integer"
      },
      "ranking": {
        "type": "integer"
      }
    }
  }
}

因爲沒有安裝結巴分詞插件，所以創建失敗

查看Mapping

GET index/_mapping?pretty

索引文檔

POST index/type/{id}

{"age":23,"class":"一年級2班","name":"小黑","ranking":13,"sex":1}

[站外圖片上傳中…(image-65f02e-1520174200792)]

刪除所有文檔數據

POST index/type/_delete_by_query?conflicts=proceed

{
  "query": {"match_all": {}}
}`

重點

ES支持的類型

這裏用的是5.4

類型	包括
String	text, keyword
Number	long, integer, short, byte, double, float, half_float, scaled_float
Date	date
Boolean	boolean
Binary	binary
Rande	integer_range, float_range, long_range, double_range, date_range

官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/5.4/mapping-types.html

關於String類型的text和keyword，我的理解是

如果該字段需要被分詞，就使用text，如果不需要分詞就使用keyword

分詞

分詞在ES中是比較重要的，因爲分詞的好壞直接影響到搜索結果準確度的高低！
分詞器接受一個字符串作爲輸入，將這個字符串拆分成獨立的詞或語彙單元（token）（可能會丟棄一些標點符號等字符），然後輸出一個語彙單元流（token stream）。
https://www.elastic.co/guide/cn/elasticsearch/guide/current/standard-tokenizer.html

分詞分爲兩種：索引分詞、搜索分詞
在創建mapping的時候聲明，如上

"name": {
    "type": "text",
    "analyzer": "jieba_index",
    "search_analyzer": "jieba_search"
  }

(下面是個人觀點，如有問題，歡迎指出）
舉個例子：
“我愛吃腸粉” 經過分詞後可能有以下幾個結果
- 我，愛，吃，腸，粉
- 我愛，吃，腸粉
- 我愛吃，腸粉
- 。。。

那麼分詞在ES中是怎樣應用的？
當“我愛吃腸粉”索引到ES中之後，ES中對此句的描述變爲“我”，“愛”，“吃”，“腸”，“粉”，此爲索引分詞

當用戶查詢“我”的時候，ES會將分詞結果中包含“我”的結果輸出，所以“我愛吃腸粉”會被搜索出來；如果輸入“我愛”，而且搜索分詞的結果也爲“我愛”的時候，“我愛吃腸粉”則不會被搜索出來

ES默認的分詞爲“英文分詞”，即“我愛吃腸粉”的第一種分詞結果，這很顯然不符合我們一般的應用場景，所以這個時候就需要引入第三方插件了，如“結巴分詞”和IK分詞
IK: https://github.com/medcl/elasticsearch-analysis-ik
Jieba: https://github.com/sing1ee/elasticsearch-jieba-plugin

按照自己的具體搜索場景來選擇合適的分詞插件

term 和 match 的使用

可學習這篇博客：http://www.cnblogs.com/yjf512/p/4897294.html 寫的詳細全面

2018年3月5日更新

更新

PUT /index/type/id

{
  "title": "My first blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2014/01/02"
}

這個id是es自己的id(可在索引的時候設置id)
java實現

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index(index);
updateRequest.type(document_type);
updateRequest.id(resumeId);
updateRequest.doc(jsonBuilder().startObject().field(fileName, fileValue).endObject());
UpdateResponse updateResponse = elasticSearchClient.getClient().update(updateRequest).get();

https://www.elastic.co/guide/cn/elasticsearch/guide/current/update-doc.html

updateByquery

POST index/_update_by_query

{
  "script": {
    "inline": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}

ctx._source 爲一條記錄對象
上面是“將查出來的文檔中likes的值加1”

java實現

TransportClient client = elasticSearchClient.getClient();
UpdateByQueryRequestBuilder updateByQueryRequestBuilder = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);
String script = "";
if (1 == switchValue) {
    script = "ctx._source.is_buy = 1";
} else {
    script = "ctx._source.is_buy = 0";
}
Script scriptObj = new Script(script);
BulkByScrollResponse bulkByScrollResponse = updateByQueryRequestBuilder.source(index)
        .script(scriptObj)
        .filter(QueryBuilders.termQuery("owner_id", ownId)).abortOnVersionConflict(false).get();
List<BulkItemResponse.Failure> bulkFailures = bulkByScrollResponse.getBulkFailures();
for (BulkItemResponse.Failure bulkFailure : bulkFailures) {
    logger.error(bulkFailure.getMessage());
}

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/docs-update-by-query.html

以上就是我在工作中使用ES 總結的內容，入門到會使用應該是沒問題。之後會繼續學習並更新~

CSDN：http://blog.csdn.net/qqhjqs?viewmode=list
博客：http://vector4wang.tk/
簡書：https://www.jianshu.com/u/223a1314e818
Github:https://github.com/vector4wang
Gitee:https://gitee.com/backwxc

我的ElasticSearch使用筆記

部署

ES 的相關使用

查看狀態

查看節點列表

創建索引

查看索引列表

刪除索引

創建Mapping

查看Mapping

索引文檔

刪除所有文檔數據

重點

ES支持的類型

分詞

term 和 match 的使用

2018年3月5日更新

更新

updateByquery

Java中對象佔用內存大小計算

CSDN博客遷移至Hexo之同步CSDN博文到本地MD文件

CSDN博客遷移至Hexo之Html2Md服務(專爲csdn打造)

爲什麼硅谷公司已經不用Nexus管理Maven倉庫？

2019程序員個人面試隨筆

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

我的ElasticSearch使用筆記

部署

ES 的相關使用

查看狀態

查看節點列表

創建索引

查看索引列表

刪除索引

創建Mapping

查看Mapping

索引文檔

刪除所有文檔數據

重點

ES支持的類型

分詞

term 和 match 的使用

2018年3月5日 更新

更新

updateByquery

2018年3月5日更新