本次使用的Linux發行版是CentOS6.5,coreseek版本爲4.1
Sphinx是一個基於SQL的全文檢索引擎,可以結合MySQL,PostgreSQL做全文搜索,它可以提供比數據庫本身更專業的搜索功能,使得應用程序更容易實現專業化的全文檢索。Sphinx特別爲一些腳本語言設計搜索API接口,如PHP,Python,Perl,Ruby等,同時爲MySQL也設計了一個存儲引擎插件。
Sphinx的主要特性包括:
高速索引 (在新款CPU上,近10 MB/秒); 高速搜索 (2-4G的文本量中平均查詢速度不到0.1秒); 高可用性 (單CPU上最大可支持100 GB的文本,100M文檔); 提供良好的相關性排名 支持分佈式搜索; 提供文檔摘要生成; 提供從MySQL內部的插件式存儲引擎上搜索 支持布爾,短語, 和近義詞查詢; 支持每個文檔多個全文檢索域(默認最大32個); 支持每個文檔多屬性; 支持斷詞; 支持單字節編碼與UTF-8編碼;
coreseek是一款基於sphinx開源的搜索引擎,專門爲用戶提供免費的中文全文檢索系統,coreseek被稱爲帶有中文分詞的sphinx,與sphinx不同的是coreseek增加了一個帶有中文分司的詞庫。
所以此次直接安裝coreseek4.1版本,因爲它已經內置了Sphinx源碼。
一、Coreseek安裝
1、安裝mmseg
[root@test3 ~]# tar xf coreseek-4.1-beta.tar.gz -C /usr/src [root@test3 ~]# cd /usr/src/coreseek-4.1-beta [root@test3 coreseek-4.1-beta]# ls csft-4.1 mmseg-3.2.14 README.txt testpack # 其中csft-4.1爲sphinx源碼,mmseg爲中文分詞安裝包 [root@test3 coreseek-4.1-beta]# cd mmseg-3.2.14 [root@test3 mmseg-3.2.14]# ./bootstrap # 創建生成的文件 [root@test3 mmseg-3.2.14]# ./configure --prefix=/usr/local/mmseg3 [root@test3 mmseg-3.2.14]# make && make install
2、安裝Coreseek(Sphinx)
[root@test3 mmseg-3.2.14]# cd ../csft-4.1/ [root@test3 coreseek-4.1-beta]# ./buildconf.sh # 檢測並創建安裝文件 [root@test3 coreseek-4.1-beta]# ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql [root@test3 coreseek-4.1-beta]# make && make install
二、配置Coreseek
[root@test3 csft-4.1]# cd /usr/local/coreseek/ [root@test3 coreseek]# ls bin etc share var [root@test3 coreseek]# cd etc/ && ll -rw-r--r--. 1 root root 903 Jul 28 10:08 example.sql # 示例的數據 -rw-r--r--. 1 root root 31081 Jul 28 09:25 sphinx.conf.dist # 完整配置文件 -rw-r--r--. 1 root root 1163 Jun 12 00:40 sphinx-min.conf.dist # 最小化配置文件 # 在mysql中創建一個test數據庫,將example.sql導入 [root@test3 etc]# mysql -uroot -ppasswd -e 'create database test2;' [root@test3 etc]# mysql -uroot -ppasswd < example.sql # 生成配置文件 [root@test3 etc]# cp sphinx-min.conf.dist csft.conf [root@test3 etc]# vim csft.conf # # Minimal Sphinx configuration sample (clean, simple, functional) # source src1 # 數據源 { type = mysql sql_host = localhost sql_user = root sql_pass = passwd sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id } index test1 # 索引 { source = src1 path = /usr/local/coreseek/var/data/test1 docinfo = extern charset_dictpath = /usr/local/mmseg3/etc/ charset_type = zh_cn.utf-8 } index testrt { type = rt rt_mem_limit = 32M path = /usr/local/coreseek/var/data/testrt charset_type = utf-8 rt_field = title rt_field = content rt_attr_uint = gid } indexer # 構建索引服務 { mem_limit = 32M } searchd # 搜索查詢服務 { listen = 9312 listen = 9306:mysql41 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work }
三、啓動Coreseek,並測試
[root@test3 etc]# /usr/local/coreseek/bin/searchd -c csft.conf # 啓動服務 Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file 'sphinx-min.conf.dist'... WARNING: compat_sphinxql_magics=1 is deprecated; please update your application and config listening on all interfaces, port=9312 listening on all interfaces, port=9306 precaching index 'test1' precaching index 'testrt' precached 2 indexes in 0.001 sec [root@test3 etc]# ss -tnl | grep -e 9306 -e 9312 # 端口已開啓 LISTEN 0 5 *:9306 *:* LISTEN 0 5 *:9312 *:* # 測試 [root@test3 etc]# /usr/local/coreseek/bin/indexer --all --rotate # 創建索引 [root@test3 etc]# /usr/local/coreseek/bin/search test # 查找test關鍵字 Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec displaying matches: 1. document=1, weight=2421, group_id=1, date_added=Thu Jul 28 13:39:05 2016 id=1 group_id=1 group_id2=5 date_added=2016-07-28 13:39:05 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=2421, group_id=1, date_added=Thu Jul 28 13:39:05 2016 id=2 group_id=1 group_id2=6 date_added=2016-07-28 13:39:05 title=test two content=this is my test document number two 3. document=4, weight=1442, group_id=2, date_added=Thu Jul 28 13:39:05 2016 id=4 group_id=2 group_id2=8 date_added=2016-07-28 13:39:05 title=doc number four content=this is to test groups words: 1. 'test': 3 documents, 5 hits index 'testrt': search error: failed to open /usr/local/coreseek/var/data/testrt.sph: No such file or directory.
四、修改配置文件,實現多表查找
[root@test3 etc]# vim csft.conf # # Minimal Sphinx configuration sample (clean, simple, functional) # source host { type = mysql sql_host = localhost sql_user = root sql_pass = passwd sql_db = test sql_port = 3306 # optional, default is 3306 } source src1:host { sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_joined_field = tags from query; SELECT id, name FROM tags ORDER BY id ASC sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id } source src2:host { sql_query = SELECT id,name,docid,tagid FROM tags ORDER BY tagid ASC sql_attr_uint = id sql_query_info = SELECT * FROM tags WHERE id=$id } index test1 { source = src1 path = /usr/local/coreseek/var/data/test1 docinfo = extern charset_dictpath = /usr/local/mmseg3/etc/ charset_type = zh_cn.utf-8 } index test2 { source = src2 path = /usr/local/coreseek/var/data/test2 docinfo = extern charset_dictpath = /usr/local/mmseg3/etc/ charset_type = zh_cn.utf-8 } index testrt { type = rt rt_mem_limit = 32M path = /usr/local/coreseek/var/data/testrt charset_type = utf-8 rt_field = title rt_field = content rt_attr_uint = gid } indexer { mem_limit = 32M } searchd { listen = 9312 listen = 9306:mysql41 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work }
數據庫信息
mysql> select * from tags; +----+------------------+ | id | content | +----+------------------+ | 1 | test one time | | 2 | test two times | | 3 | test three times | | 4 | test four times | +----+------------------+ 4 rows in set (0.00 sec) mysql> select * from documents; +----+----------+-----------+---------------------+-----------------+-----------------------+ | id | group_id | group_id2 | date_added | title | content | +----+----------+-----------+---------------------+-----------------+-----------------------+ | 1 | 1 | 5 | 2016-07-28 13:57:00 | test one | 第一個測試文檔 | | 2 | 1 | 6 | 2016-07-28 13:57:00 | test two | 第二個測試文檔 | | 3 | 2 | 7 | 2016-07-28 13:57:00 | another doc | 另一個文檔 | | 4 | 2 | 8 | 2016-07-28 13:57:00 | doc number four | 測試組 | +----+----------+-----------+---------------------+-----------------+-----------------------+ 4 rows in set (0.00 sec)
查詢結果
[root@test3 etc]# /usr/local/coreseek/bin/search test Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... index 'test1': query 'test ': returned 4 matches of 4 total in 0.000 sec displaying matches: 1. document=1, weight=2230, group_id=1, date_added=Thu Jul 28 13:57:00 2016 id=1 group_id=1 group_id2=5 date_added=2016-07-28 13:57:00 title=test one content=第一個測試文檔 2. document=2, weight=2230, group_id=1, date_added=Thu Jul 28 13:57:00 2016 id=2 group_id=1 group_id2=6 date_added=2016-07-28 13:57:00 title=test two content=第二個測試文檔 3. document=3, weight=1304, group_id=2, date_added=Thu Jul 28 13:57:00 2016 id=3 group_id=2 group_id2=7 date_added=2016-07-28 13:57:00 title=another doc content=另一個文檔 4. document=4, weight=1304, group_id=2, date_added=Thu Jul 28 13:57:00 2016 id=4 group_id=2 group_id2=8 date_added=2016-07-28 13:57:00 title=doc number four content=測試組 words: 1. 'test': 4 documents, 6 hits index 'testrt': search error: failed to open /usr/local/coreseek/var/data/testrt.sph: No such file or directory.