簡單介紹:最近有人在問我,說mysql5.6既然已經支持了Innodb的全文索引了,爲什麼依然有人在使用sphinx這樣的軟件針對mysql 數據庫呢.
第一:目前仍然後很多公司在使用mysql5.5,針對innodb 存儲引擎則需要全文索引的軟件來幫忙
第二:mysql並不是一款中國人開發的服務,因此對中文分詞的支持是不行的,由此引出接下來所要講解的coreseek 中文檢索
因有童鞋對安裝和基本使用有困惑,因而將本人的基本操作寫上,如有疑問請留言
安裝
穩定版
wget http://219.232.239.243/uploads/csft/3.2/coreseek-3.2.14.tar.gz curl -O -L http://mirrors.kernel.org/gnu/autoconf/autoconf-2.13.tar.gz
針對低版本則依賴包也要使用低版本的 如果後面遇到must contain _cv_ to be cached類似的報錯就是 這個版本太高導致
測試版
wget http://219.232.239.243/uploads/csft/4.0/coreseek-4.1-beta.tar.gz
依賴包
[root@localhost etc]# yum -y install gcc make gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel [root@localhost mmseg-3.2.14]# history |grep yum 58 yum -y install gcc make libiconv python 75 yum -y install gcc-c++ [root@localhost src]# tar -xf coreseek-4.1-beta.tar.gz [root@localhost src]#curl -O -L http://mirrors.kernel.org/gnu/m4/m4-1.4.13.tar.gz [root@localhost src]# curl -O -L http://mirrors.kernel.org/gnu/autoconf/autoconf-2.65.tar.gz [root@localhost src]# curl -O -L http://mirrors.kernel.org/gnu/automake/automake-1.11.tar.gz [root@localhost src]# curl -O -L http://mirrors.kernel.org/gnu/libtool/libtool-2.2.6b.tar.gz
全部編譯安裝
./configure --prefix=/usr/local make && make install cd ..
[root@localhost bin]# iconv --version
中文分詞需要這個支持
環境修改
[root@localhost src]# locale LANG=en_US.UTF-8 [root@localhost src]# vim /etc/sysconfig/i18n [root@localhost etc]# cat /etc/sysconfig/i18n #LANG="en_US.UTF-8" LANG="zh_CN.UTF-8" #SYSFONT="latarcyrheb-sun16" SYSFONT="latarcyrheb-sun16" [root@localhost ~]# su - root(或者重啓環境纔會生效) [root@localhost src]# tar -xf coreseek-4.1-beta.tar.gz [root@localhost var]# cd /usr/local/src/coreseek-4.1-beta/testpack/var [root@localhost var]# cat test/test.xml <?xml version="1.0" encoding="utf-8"?> <sphinx:docset> <sphinx:schema> <sphinx:field name="subject"/> <sphinx:field name="content"/> <sphinx:attr name="published" type="timestamp"/> <sphinx:attr name="author_id" type="int" bits="16" default="1"/> </sphinx:schema> <sphinx:document id="1"> <subject>愚人節最佳蠱惑爆料 谷歌300億美元收購百度</subject> <published>1270131607</published> <content>據國外媒體報道,谷歌將巨資收購百度,涉及金額高達300億美元。谷歌藉此重返大陸市場。 該報道稱,目前谷歌與百度已經達成了收購協議,將擇機對外公佈。百度的管理層將100%保留,但會將項目縮減,包括有啊商城,以及目前實施不力的鳳巢計劃。正在進行測試階段的視頻網站qiyi.com將輸入更多的Youtube資源。(YouTube在大陸區因內容審查暫不能訪問)。
看到了有中文支持
編譯安裝
[root@localhost src]# cd coreseek-4.1-beta/mmseg-3.2.14/ [root@localhost mmseg-3.2.14]# ./bootstrap [root@localhost mmseg-3.2.14]# ./configure --prefix=/usr/local/mmseg/ [root@localhost mmseg-3.2.14]# make && make install [root@localhost bin]# /usr/local/mmseg/bin/mmseg -d /usr/local/mmseg/etc/ [root@localhost coreseek-3.2.14]# cd csft-3.2.14/ [root@localhost csft-3.2.14]# ls acinclude.m4 configure.ac INSTALL pymmseg sphinx-min.conf.in aclocal.m4 contrib libexpat python.m4 sphinx.spec api COPYING libstemmer_c smoke.sh sphinx.workspace buildconf.sh csft.doc Makefile.am sphinx03.sln src codeblocks csft.pytest Makefile.in sphinx05.sln test config doc misc sphinx08.sln win configure example.sql mysqlse sphinx.conf.in [root@localhost csft-3.2.14]# sh buildconf.sh [root@localhost csft-4.1]# ./configure --prefix=/usr/local/coreseek/ --without-python --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mysql [root@localhost csft-4.1]# make && make install 至此xml 的已經編譯完成
支持mysql 的方式
[root@localhost csft-4.1]# yum -y install mysql-devel libxml2-devel expat-devel Make clean (清楚之前的編譯) ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mysql Make && make install 至此Mysql數據源安裝完成
實例:
使用默認的sql 和配置文件做一次
mysql> GRANT ALL PRIVILEGES ON *.* TO 'test'@'%' WITH GRANT OPTION; mysql> GRANT PROXY ON ''@'' TO 'test'@'%' WITH GRANT OPTION; mysql>flush privileges; [root@localhost etc]# mysql -utest < example.sql [root@localhost etc]# pwd /usr/local/coreseek/etc [root@localhost etc]# cp sphinx.conf.dist csft.conf
[root@localhost etc]# vi csft.conf
打開第32行
csft.conf
sql_sock = /tmp/mysql.sock 根據你的mysql 的這個的實際位置
sql_query_pre = SET NAMES utf8
找到sql_query_info = SELECT * FROM documents WHERE id=$id 在該行上面加上 sql_query_info_pre = SET NAMES utf8 (作用是爲了中文能顯示)
[root@localhost etc]# /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.conf --all Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... indexing index 'test1'... collected 4 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 4 docs, 193 bytes total 0.007 sec, 24823 bytes/sec, 514.46 docs/sec indexing index 'test1stemmed'... collected 4 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 4 docs, 193 bytes total 0.003 sec, 49411 bytes/sec, 1024.06 docs/sec skipping non-plain index 'dist1'... skipping non-plain index 'rt'... total 6 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg total 18 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg [root@localhost data]# pwd /usr/local/coreseek/var/data [root@localhost data]# ls 這些文件就是上面indexer命令生成的 test1.spa test1.sph test1.spk test1.spp test1stemmed.spa test1stemmed.sph test1stemmed.spk test1stemmed.spp test1.spd test1.spi test1.spm test1.sps test1stemmed.spd test1stemmed.spi test1stemmed.spm test1stemmed.sps
[root@localhost data]# /usr/local/coreseek/bin/search test #就可以搜索到定義的數據庫裏的表內容了 1. document=1, weight=2421, group_id=1, date_added=Wed Aug 13 01:29:11 2014 id=1 group_id=1 group_id2=5 date_added=2014-08-13 01:29:11 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=2421, group_id=1, date_added=Wed Aug 13 01:29:11 2014 id=2 group_id=1 group_id2=6 date_added=2014-08-13 01:29:11 title=test two content=this is my test document number two 3. document=4, weight=1442, group_id=2, date_added=Wed Aug 13 01:29:11 2014 id=4 group_id=2 group_id2=8 date_added=2014-08-13 01:29:11 title=doc number four content=this is to test groups mysql> update test.documents set content='草泥馬'; [root@localhost etc]# /usr/local/coreseek/bin/search test Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec displaying matches: 1. document=1, weight=2421, group_id=1, date_added=Wed Aug 13 01:29:11 2014 id=1 group_id=1 group_id2=5 date_added=2014-08-13 01:29:11 title=test one content=草泥馬 2. document=2, weight=2421, group_id=1, date_added=Wed Aug 13 01:29:11 2014 id=2 group_id=1 group_id2=6 date_added=2014-08-13 01:29:11 title=test two content=草泥馬 3. document=4, weight=1442, group_id=2, date_added=Wed Aug 13 01:29:11 2014 id=4 group_id=2 group_id2=8 date_added=2014-08-13 01:29:11 title=doc number four content=草泥馬
詳細具體使用的方法參照:http://www.coreseek.cn/