一、Sphinx的特性
- 高速的建立索引(在当代CPU上,峰值性能可达到10 MB/秒);
- 高性能的搜索(在2 – 4GB 的文本数据上,平均每次检索响应时间小于0.1秒);
- 可处理海量数据(目前已知可以处理超过100 GB的文本数据, 在单一CPU的系统上可 处理100 M 文档);
- 提供了优秀的相关度算法,基于短语相似度和统计(BM25)的复合Ranking方法;
- 支持分布式搜索;
- 可作为MySQL的存储引擎提供搜索服务;
- 支持布尔、短语、词语相似度等多种检索模式;
- 文档支持多个全文检索字段(最大不超过32个);
- 文档支持多个额外的属性信息(例如:分组信息,时间戳等);
- 停止词查询;
- 支持单一字节编码和UTF-8编码;
- 原生的MySQL支持(同时支持MyISAM 和InnoDB );
- 原生的PostgreSQL 支持.
二、安装中文分词mmseg
- 下载mmseg
wget http://www.coreseek.com/uploads/sources/mmseg-0.7.3.tar.gz
- 编译安装
tar zxf mmseg-0.7.3.tar.gz && cd mmseg-0.7.3
./configure --prefix=/usr/local/mmseg
`make && make install
ln -s /usr/local/mmseg/bin/mmseg /usr/bin/
- 如果出现
make[2]: *** [UnigramCorpusReader.lo] Error 1
报错,解决方法如下:
vim src/css/UnigramCorpusReader.cpp
在23行加入#include <string.h>
重新编译 - 测试安装成功如下所示
[root@iZ28bak61f3Z bin]#mmseg
Coreseek COS(tm) MM Segment 1.0
Copyright By Coreseek.com All Right Reserved.
Usage: ./mmseg <option> <file>
-u <unidict> Unigram Dictionary
-r Combine with -u, used a plain text build Unigram Dictionary, default Off
-b <Synonyms> Synonyms Dictionary
-h print this help and exit
三、编译安装配置 (MySQL 5.6.24)
下载sphinx
wget http://pkgs.fedoraproject.org/repo/pkgs/sphinx/sphinx-0.9.9.tar.gz/7b9b618cb9b378f949bb1b91ddcc4f54/sphinx-0.9.9.tar.gz
编译安装sphinx
tar zxf sphinx-0.9.9.tar.gz
cd sphinx-0.9.9
yum install python-devel
./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql/ --with-mysql-includes=/usr/local/mysql/include/ --with-mysql-libs=/usr/local/mysql/lib/ --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mmseg
make && make install
- 配置sphinx
cd /usr/local/sphinx/etc/
cp sphinx.conf.dist sphinx.conf
修改以下内容
type = mysql
sql_host = localhost
sql_user = root
sql_pass = ****
sql_db = test
sql_port = 3306
四、建立索引
创建一个test库,导入 example.sql,进行测试
mysql -uroot -p < /usr/local/sphinx/etc/example.sql
索引的建立方法:
/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf test1
test1为资源名称 不写为默认所有都建立索引,出现如下提示,说明索引建立完成
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.030 sec, 6256 bytes/sec, 129.67 docs/sec
total 2 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 7 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
五、启动服务器
- 开启
/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf
- 关闭
/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf --stop
- 通过search工具查询
/usr/local/sphinx/bin/search --config /usr/local/sphinx/etc/sphinx.conf test
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec
displaying matches:
1. document=1, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015
id=1
group_id=1
group_id2=5
date_added=2015-10-14 15:01:39
title=test one
content=this is my test document number one. also checking search within phrases.
2. document=2, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015
id=2
group_id=1
group_id2=6
date_added=2015-10-14 15:01:39
title=test two
content=this is my test document number two
3. document=4, weight=1, group_id=2, date_added=Wed Oct 14 15:01:39 2015
id=4
group_id=2
group_id2=8
date_added=2015-10-14 15:01:39
title=doc number four
content=this is to test groups
words:
1. 'test': 3 documents, 5 hits
六、sphinx 启动脚本
vim /etc/init.d/sphinx
#!/bin/bash
# sphinx: Startup script for Sphinx search
#
# chkconfig: 345 86 14
# description: This is a daemon for high performance full text /
# search of MySQL and PostgreSQL databases. /
# See http://www.sphinxsearch.com/ for more info.
#
# processname: searchd
# pidfile: $sphinxlocation/var/log/searchd.pid
# Source function library.
. /etc/rc.d/init.d/functions
processname=searchd
servicename=sphinx
username=sphinx
sphinxlocation=/usr/local/sphinx
pidfile=$sphinxlocation/var/log/searchd.pid
searchd=$sphinxlocation/bin/searchd
RETVAL=0
PATH=$PATH:$sphinxlocation/bin
start() {
echo -n $"Starting Sphinx daemon: "
daemon --user=$username --check $servicename $processname
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename
}
stop() {
echo -n $"Stopping Sphinx daemon: "
$searchd --stop
#killproc -p $pidfile $servicename -TERM
RETVAL=$?
echo
if [ $RETVAL -eq 0 ]; then
rm -f /var/lock/subsys/$servicename
rm -f $pidfile
fi
}
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status $processname
RETVAL=$?
;;
restart)
stop
sleep 3
start
;;
condrestart)
if [ -f /var/lock/subsys/$servicename ]; then
stop
sleep 3
start
fi
;;
*)
echo $"Usage: $0 {start|stop|status|restart|condrestart}"
;;
esac
exit $RETVAL
chmod 755 /etc/init.d/sphinx
chkconfig --add sphinx
chkconfig --level 345 sphinx on #开机启动
service sphinx start #运行
service sphinx stop #停止
service sphinx restart #重启
service sphinx status #查看是否运行