ElasticSearch安裝中文分詞器IK

1、安裝IK分詞器,下載對應版本的插件,elasticsearch-analysis-ik中文分詞器的開發者一直進行維護的,對應着elasticsearch的版本,所以選擇好自己的版本即可。IKAnalyzer中文分詞器原作者已經不進行維護了,但是Lucece在不斷更新,所以使用Lucece和IKAnalyzer中文分詞器集成,需要你進行修改IKAnalyzer中文分詞器。

下載地址:https://github.com/medcl/elasticsearch-analysis-ik/releases

將下載好的中文分詞器上傳到你的服務器,或者使用wget命令聯網下載,蘿蔔白菜各有所愛吧。我的IK中文分詞器版本對應了ElasticSearch的版本。下載好開始安裝,由於我是僞分佈集羣,就一臺集羣,就安裝一次即可,但是如果是分佈式的話,每一臺機器都要安裝中文分詞器的。

2、開始解壓縮操作,將elasticsearch-analysis-ik-5.4.3.zip拷貝到一個目錄裏面進行解壓縮操作,安裝IK中文分詞器。

 1 [root@slaver4 package]# mkdir elasticsearch-analysis-ik
 2 [root@slaver4 package]# cp elasticsearch-analysis-ik-5.4.3.zip elasticsearch-analysis-ik
 3 [root@slaver4 elasticsearch-analysis-ik]# unzip elasticsearch-analysis-ik-5.4.3.zip 
 4 Archive:  elasticsearch-analysis-ik-5.4.3.zip
 5   inflating: elasticsearch-analysis-ik-5.4.3.jar  
 6   inflating: httpclient-4.5.2.jar    
 7   inflating: httpcore-4.4.4.jar      
 8   inflating: commons-logging-1.2.jar  
 9   inflating: commons-codec-1.9.jar   
10    creating: config/
11    creating: config/custom/
12   inflating: config/surname.dic      
13   inflating: config/preposition.dic  
14   inflating: config/custom/mydict.dic  
15   inflating: config/custom/single_word_full.dic  
16   inflating: config/custom/sougou.dic  
17   inflating: config/custom/ext_stopword.dic  
18   inflating: config/custom/single_word.dic  
19   inflating: config/custom/single_word_low_freq.dic  
20   inflating: config/main.dic         
21   inflating: config/IKAnalyzer.cfg.xml  
22   inflating: config/quantifier.dic   
23   inflating: config/stopword.dic     
24   inflating: config/suffix.dic       
25   inflating: plugin-descriptor.properties  
26 [root@slaver4 elasticsearch-analysis-ik]# ls
27 commons-codec-1.9.jar  commons-logging-1.2.jar  config  elasticsearch-analysis-ik-5.4.3.jar  elasticsearch-analysis-ik-5.4.3.zip  httpclient-4.5.2.jar  httpcore-4.4.4.jar  plugin-descriptor.properties
28 [root@slaver4 elasticsearch-analysis-ik]# 

由於unzip的默認解壓縮到當前目錄,這裏可以將elasticsearch-analysis-ik-5.4.3.zip包刪除掉。

1 [root@slaver4 elasticsearch-analysis-ik]# ls
2 commons-codec-1.9.jar  commons-logging-1.2.jar  config  elasticsearch-analysis-ik-5.4.3.jar  elasticsearch-analysis-ik-5.4.3.zip  httpclient-4.5.2.jar  httpcore-4.4.4.jar  plugin-descriptor.properties
3 [root@slaver4 elasticsearch-analysis-ik]# rm -rf elasticsearch-analysis-ik-5.4.3.zip 
4 [root@slaver4 elasticsearch-analysis-ik]# ls
5 commons-codec-1.9.jar  commons-logging-1.2.jar  config  elasticsearch-analysis-ik-5.4.3.jar  httpclient-4.5.2.jar  httpcore-4.4.4.jar  plugin-descriptor.properties
6 [root@slaver4 elasticsearch-analysis-ik]#

然後將解壓縮好的IK移動到ElasticSearch的plugins目錄下面。記得三個節點plugins目錄裏面都需要放的哦。如下所示:

注意:放到plugins目錄裏面的IK中文分詞器插件,必須放到一個目錄裏面,再放到plugins目錄裏面。如我的elasticsearch-analysis-ik裏面存的就是IK中文分詞器解壓縮後的文件。

1 [root@slaver4 package]# mv elasticsearch-analysis-ik/ /home/hadoop/soft/elasticsearch-5.4.3/plugins
2 [root@slaver4 package]# cd /home/hadoop/soft/elasticsearch-5.4.3/plugins
3 [root@slaver4 plugins]# ls
4 elasticsearch-analysis-ik
5 [root@slaver4 plugins]#

由於Elasticsearch沒有提供關閉的命令,使用kill -9 進程號,在生產環境也是大忌的,生產環境如果使用kill命令的話,建議使用kill 進程號,讓系統把任務處理完,再關閉進程。如果你的es是啓動的,使用如下命令進行重啓,如果es沒有啓動,直接啓動即可。

1 -- 如果是集羣式的,每個節點在每個機器上面,執行如下命令就可以停止es
2 [root@slaver4 elasticsearch-analysis-ik]# kill `ps -ef | grep Elasticsearch | grep -v grep | awk '{print $2}'`

我的es僞集羣未啓動,這裏直接啓動即可,僞集羣,你啓動最好同時啓動,不然會出現一些報錯,因爲他們要感應到集羣中的節點,但是沒有很大影響的。

在啓動head插件的時候,報這種錯誤,總感覺心裏不爽,Local Npm module "grunt-contrib-jasmine" not found. Is it installed?。

 1 [elsearch@slaver4 soft]$ cd elasticsearch-head-master/
 2 [elsearch@slaver4 elasticsearch-head-master]$ ls
 3 crx         Dockerfile-alpine                   Gruntfile.js       index.html  node_modules  package-lock.json             proxy           _site  test
 4 Dockerfile  elasticsearch-head.sublime-project  grunt_fileSets.js  LICENCE     package.json  plugin-descriptor.properties  README.textile  src
 5 [elsearch@slaver4 elasticsearch-head-master]$ cd node_modules/grunt
 6 [elsearch@slaver4 grunt]$ ls
 7 bin  CHANGELOG  lib  LICENSE  node_modules  package.json  README.md
 8 [elsearch@slaver4 grunt]$ cd bin/
 9 [elsearch@slaver4 bin]$ ls
10 grunt
11 [elsearch@slaver4 bin]$ ./grunt -server &
12 [1] 8602
13 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed?
14 Warning: Task "jasmine" not found. Use --force to continue.
15 
16 Aborted due to warnings.
17 
18 [1]+  Exit 3                  ./grunt -server
19 [elsearch@slaver4 bin]$ jps
20 8388 Elasticsearch
21 8457 Elasticsearch
22 8527 Elasticsearch
23 8623 Jps
24 [elsearch@slaver4 bin]$ ls
25 grunt
26 [elsearch@slaver4 bin]$ ./grunt server &
27 [1] 8633
28 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed?
29 
30 Running "connect:server" (connect) task
31 Waiting forever...
32 Started connect web server on http://192.168.110.133:9100

如何解決上面的錯誤呢,如下所示,安裝錯誤的缺失包即可。切記,進入到head的目錄,執行命令即可npm install grunt-contrib-jasmine。

如果缺少下面的包,執行如是命令即可。

[elsearch@slaver4 elasticsearch-head-master]$  npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine

或者,使用淘寶的源,下載比較快些。

[elsearch@slaver4 elasticsearch-head-master]$  npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine --registry=https://registry.npm.taobao.org

 1 [elsearch@slaver4 elasticsearch-head-master]$ ls
 2 crx         Dockerfile-alpine                   Gruntfile.js       index.html  node_modules  package-lock.json             proxy           _site  test
 3 Dockerfile  elasticsearch-head.sublime-project  grunt_fileSets.js  LICENCE     package.json  plugin-descriptor.properties  README.textile  src
 4 [elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-jasmine
 5 npm WARN [email protected] license should be a valid SPDX license expression
 6 npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
 7 npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})
 8 
 9 + [email protected]
10 added 9 packages from 13 contributors in 30.811s
11 [elsearch@slaver4 elasticsearch-head-master]$

搞了一下,把head插件搞的起不來了,也是鬱悶。啓動錯誤如下所示。

1 [elsearch@slaver4 bin]$ >> No "clean" targets found.
2 Warning: Task "clean" failed. Use --force to continue.
3 
4 Aborted due to warnings.
5 
6 [1]+  Exit 3                  ./grunt -server

起不來了,估計是依賴包,出現了問題,我這裏直接就將所有的依賴包都更新到最新了。

注意:我使用命令./grunt -server &,這個命令是使用錯誤了,所以下面更新到最新的依賴包可以不進行操作,浪費時間。

 1 [elsearch@slaver4 bin]$ npm install grunt@latest
 2 stall grunt-contrib-connect@latest
 3 npm ERR! code ENOSELF
 4 npm ERR! Refusing to install package with name "grunt" under a package
 5 npm ERR! also called "grunt". Did you name your project the same
 6 npm ERR! as the dependency you're installing?
 7 npm ERR! 
 8 npm ERR! For more information, see:
 9 npm ERR!     <https://docs.npmjs.com/cli/install#limitations-of-npms-install-algorithm>
10 
11 npm ERR! A complete log of this run can be found in:
12 npm ERR!     /home/elsearch/.npm/_logs/2019-10-20T09_08_08_039Z-debug.log
13 [elsearch@slaver4 bin]$ npm install grunt-cli@latest
14 npm notice created a lockfile as package-lock.json. You should commit this file.
15 + [email protected]
16 added 148 packages from 121 contributors, updated 2 packages and audited 984 packages in 180.566s
17 found 0 vulnerabilities
18 
19 [elsearch@slaver4 bin]$ npm install grunt-contrib-copy@latest
20 + [email protected]
21 added 9 packages from 3 contributors and audited 994 packages in 36.523s
22 found 0 vulnerabilities
23 
24 [elsearch@slaver4 bin]$ npm install grunt-contrib-concat@latest
25 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
26 
27 + [email protected]
28 added 1 package from 1 contributor and audited 1004 packages in 5.282s
29 found 0 vulnerabilities
30 
31 [elsearch@slaver4 bin]$ npm install grunt-contrib-uglify@latest
32 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
33 
34 + [email protected]
35 added 18 packages from 44 contributors and audited 1032 packages in 84.271s
36 found 0 vulnerabilities
37 
38 [elsearch@slaver4 bin]$ npm install grunt-contrib-clean@latest
39 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
40 npm WARN [email protected] requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
41 
42 + [email protected]
43 added 15 packages from 8 contributors and audited 1050 packages in 16.732s
44 found 0 vulnerabilities
45 
46 [elsearch@slaver4 bin]$ npm install grunt-contrib-watch@latest
47 npm WARN [email protected] requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
48 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
49 
50 + [email protected]
51 added 24 packages from 28 contributors in 46.459s
52 [elsearch@slaver4 bin]$ npm install grunt-contrib-connect@latest
53 npm WARN [email protected] requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
54 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
55 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
56 
57 + [email protected]
58 added 69 packages from 69 contributors and audited 1226 packages in 77.331s
59 found 0 vulnerabilities
60 
61 [elsearch@slaver4 bin]$ npm install grunt-contrib-jasmine@latest
62 
63 > [email protected] install /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer
64 > node install.js
65 
66 Downloading Chromium r686378 - 114 Mb [====================] 100% 0.0s 
67 Chromium downloaded to /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer/.local-chromium/linux-686378
68 npm WARN [email protected] requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
69 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
70 npm WARN [email protected] requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
71 npm WARN [email protected] requires a peer of grunt@>=1 but none is installed. You must install peer dependencies yourself.
72 
73 + [email protected]
74 added 244 packages from 120 contributors and audited 2725 packages in 424.187s
75 found 3 high severity vulnerabilities
76   run `npm audit fix` to fix them, or `npm audit` for details
77 [elsearch@slaver4 bin]$ ls
78 grunt
79 [elsearch@slaver4 bin]$ ./grunt server &
80 [1] 8297
81 [elsearch@slaver4 bin]$ Running "connect:server" (connect) task
82 Waiting forever...
83 Started connect web server on http://192.168.110.133:9100
84 
85 [elsearch@slaver4 bin]$

3、迴歸正題,由於安裝好了IK中文分詞器,現在需要測試一下,是否安裝成功了。

首先創建索引名字叫news,創建完畢以後可以在head插件看到你創建的index索引名稱news,如下所示:

1 [elsearch@slaver4 soft]$ ls
2 elasticsearch-5.4.3  elasticsearch-head-master  el_slave  node-v8.16.2-linux-x64  nohup.out
3 [elsearch@slaver4 soft]$ curl -XPUT http://192.168.110.133:9200/news
4 {"acknowledged":true,"shards_acknowledged":true}[elsearch@slaver4 soft]$ 
5 [elsearch@slaver4 soft]$ 

創建好索引以後,創建mapping映射(相當於數據中的schema信息,表名和字段名以及字段的類型),指定那些字段使用什麼分詞器。切記,三個節點的plugins目錄都要放IK中文分詞器。

注意:text是分詞,存儲,建索引。analyzer指定創建索引的時候使用的分詞器是IK中文分詞器。search_analyzer搜索的時候使用IK中文分詞器。

 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_mapping -d'
 2 {
 3         "properties": {
 4             "content": {
 5                 "type": "text",
 6                 "analyzer": "ik_max_word",
 7                 "search_analyzer": "ik_max_word"
 8             }
 9         }
10     
11 }'
12 {"acknowledged":true}[elsearch@slaver4 soft]$

然後開始測試,是否能將中文進行分詞。指定中文分詞器ik_max_word,-d後面是傳參的。

 1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_max_word' -d '我要成爲java高級工程師'
 2 {
 3   "tokens" : [
 4     {
 5       "token" : "我",
 6       "start_offset" : 0,
 7       "end_offset" : 1,
 8       "type" : "CN_CHAR",
 9       "position" : 0
10     },
11     {
12       "token" : "要",
13       "start_offset" : 1,
14       "end_offset" : 2,
15       "type" : "CN_CHAR",
16       "position" : 1
17     },
18     {
19       "token" : "成爲",
20       "start_offset" : 2,
21       "end_offset" : 4,
22       "type" : "CN_WORD",
23       "position" : 2
24     },
25     {
26       "token" : "java",
27       "start_offset" : 4,
28       "end_offset" : 8,
29       "type" : "ENGLISH",
30       "position" : 3
31     },
32     {
33       "token" : "高級工程師",
34       "start_offset" : 8,
35       "end_offset" : 13,
36       "type" : "CN_WORD",
37       "position" : 4
38     },
39     {
40       "token" : "高級工",
41       "start_offset" : 8,
42       "end_offset" : 11,
43       "type" : "CN_WORD",
44       "position" : 5
45     },
46     {
47       "token" : "高級",
48       "start_offset" : 8,
49       "end_offset" : 10,
50       "type" : "CN_WORD",
51       "position" : 6
52     },
53     {
54       "token" : "工程師",
55       "start_offset" : 10,
56       "end_offset" : 13,
57       "type" : "CN_WORD",
58       "position" : 7
59     },
60     {
61       "token" : "工程",
62       "start_offset" : 10,
63       "end_offset" : 12,
64       "type" : "CN_WORD",
65       "position" : 8
66     },
67     {
68       "token" : "師",
69       "start_offset" : 12,
70       "end_offset" : 13,
71       "type" : "CN_CHAR",
72       "position" : 9
73     }
74   ]
75 }
76 [elsearch@slaver4 soft]$ 

ik_smart是更加智能的分詞器,推薦使用的,效果如下所示:

 1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_smart' -d '我要成爲Java高級工程師'
 2 {
 3   "tokens" : [
 4     {
 5       "token" : "我",
 6       "start_offset" : 0,
 7       "end_offset" : 1,
 8       "type" : "CN_CHAR",
 9       "position" : 0
10     },
11     {
12       "token" : "要",
13       "start_offset" : 1,
14       "end_offset" : 2,
15       "type" : "CN_CHAR",
16       "position" : 1
17     },
18     {
19       "token" : "成爲",
20       "start_offset" : 2,
21       "end_offset" : 4,
22       "type" : "CN_WORD",
23       "position" : 2
24     },
25     {
26       "token" : "java",
27       "start_offset" : 4,
28       "end_offset" : 8,
29       "type" : "ENGLISH",
30       "position" : 3
31     },
32     {
33       "token" : "高級工程師",
34       "start_offset" : 8,
35       "end_offset" : 13,
36       "type" : "CN_WORD",
37       "position" : 4
38     }
39   ]
40 }
41 [elsearch@slaver4 soft]$

至此,IK中文分詞器就創建完畢了。

如何向索引Index中類型Type中添加數據,數據插入成功可以在head插件進行瀏覽,如下所示。

 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/1 -d'
 2 > {"content":"美國留給伊拉克的是個爛攤子嗎"}'
 3 {"_index":"news","_type":"fulltext","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 
 4 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/2 -d'
 5 > {"content":"公安部:各地校車將享最高路權"}'
 6 {"_index":"news","_type":"fulltext","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 
 7 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/3 -d'
 8 > {"content":"中韓漁警衝突調查:韓警平均每天扣1艘中國漁船"}'
 9 {"_index":"news","_type":"fulltext","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 
10 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/4 -d'
11 > {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"}'
12 {"_index":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ x":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 

數據插入成功以後,可以進行查詢高亮顯示,如下所示:

 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_search  -d'
 2 > {
 3 >     "query" : { "match" : { "content" : "中國" }},
 4 >     "highlight" : {
 5 >         "pre_tags" : ["<font color='red'>", "<tag2>"],
 6 >         "post_tags" : ["</font>", "</tag2>"],
 7 >         "fields" : {
 8 >             "content" : {}
 9 >         }
10 >     }
11 > }'
12 {"took":194,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.5347766,"hits":[{"_index":"news","_type":"fulltext","_id":"4","_score":0.5347766,"_source":
13 {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"},"highlight":{"content":["<font color=red>中國</font>駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"]}},{"_index":"news","_type":"fulltext","_id":"3","_score":0.27638745,"_source":
14 {"content":"中韓漁警衝突調查:韓警平均每天扣1艘中國漁船"},"highlight":{"content":["中韓漁警衝突調查:韓警平均每天扣1艘<font color=red>中國</font>漁船"]}}]}}[elsearch@slaver4 soft]$ 
15 [elsearch@slaver4 soft]$ 

作者:別先生

博客園:https://www.cnblogs.com/biehongli/

如果您想及時得到個人撰寫文章以及著作的消息推送,可以掃描上方二維碼,關注個人公衆號哦。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章