sphinx+mysql+mmseg 实现中文全站搜索 安装配置文档

sphinx   数据库  

一、Sphinx的特性

  • 高速的建立索引(在当代CPU上,峰值性能可达到10 MB/秒);
  • 高性能的搜索(在2 – 4GB 的文本数据上,平均每次检索响应时间小于0.1秒);
  • 可处理海量数据(目前已知可以处理超过100 GB的文本数据, 在单一CPU的系统上可 处理100 M 文档);
  • 提供了优秀的相关度算法,基于短语相似度和统计(BM25)的复合Ranking方法;
  • 支持分布式搜索;
  • 可作为MySQL的存储引擎提供搜索服务;
  • 支持布尔、短语、词语相似度等多种检索模式;
  • 文档支持多个全文检索字段(最大不超过32个);
  • 文档支持多个额外的属性信息(例如:分组信息,时间戳等);
  • 停止词查询;
  • 支持单一字节编码和UTF-8编码;
  • 原生的MySQL支持(同时支持MyISAM 和InnoDB );
  • 原生的PostgreSQL 支持.

二、安装中文分词mmseg

  • 下载mmseg
    wget http://www.coreseek.com/uploads/sources/mmseg-0.7.3.tar.gz
  • 编译安装
tar zxf mmseg-0.7.3.tar.gz && cd mmseg-0.7.3  
./configure --prefix=/usr/local/mmseg
`make && make install
ln -s /usr/local/mmseg/bin/mmseg /usr/bin/  
  • 如果出现 make[2]: *** [UnigramCorpusReader.lo] Error 1报错,解决方法如下:
    vim src/css/UnigramCorpusReader.cpp
    在23行加入#include <string.h>重新编译
  • 测试安装成功如下所示
[root@iZ28bak61f3Z bin]#mmseg 
Coreseek COS(tm) MM Segment 1.0  
Copyright By Coreseek.com All Right Reserved.  
Usage: ./mmseg <option> <file>  
-u <unidict>           Unigram Dictionary
-r           Combine with -u, used a plain text build Unigram Dictionary, default Off
-b <Synonyms>           Synonyms Dictionary
-h            print this help and exit

三、编译安装配置 (MySQL 5.6.24)

  • 下载sphinx
    wget http://pkgs.fedoraproject.org/repo/pkgs/sphinx/sphinx-0.9.9.tar.gz/7b9b618cb9b378f949bb1b91ddcc4f54/sphinx-0.9.9.tar.gz

  • 编译安装sphinx

tar zxf sphinx-0.9.9.tar.gz  
cd sphinx-0.9.9  
yum install python-devel  
./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql/ --with-mysql-includes=/usr/local/mysql/include/ --with-mysql-libs=/usr/local/mysql/lib/ --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mmseg
make && make install  
  • 配置sphinx
cd /usr/local/sphinx/etc/  
cp sphinx.conf.dist sphinx.conf  

修改以下内容

type                                    = mysql  
sql_host                                = localhost  
sql_user                                = root  
sql_pass                                = ****  
sql_db                                  = test  
sql_port                                = 3306  

四、建立索引

创建一个test库,导入 example.sql,进行测试
mysql -uroot -p < /usr/local/sphinx/etc/example.sql
索引的建立方法:
/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf test1

test1为资源名称 不写为默认所有都建立索引,出现如下提示,说明索引建立完成  
Sphinx 0.9.9-release (r2117)  
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...  
indexing index 'test1'...  
collected 4 docs, 0.0 MB  
sorted 0.0 Mhits, 100.0% done  
total 4 docs, 193 bytes  
total 0.030 sec, 6256 bytes/sec, 129.67 docs/sec  
total 2 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg  
total 7 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg  

五、启动服务器

  • 开启
    /usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf
  • 关闭
    /usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf --stop
  • 通过search工具查询
    /usr/local/sphinx/bin/search --config /usr/local/sphinx/etc/sphinx.conf test
Sphinx 0.9.9-release (r2117)  
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...  
index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec

displaying matches:  
1. document=1, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015  
        id=1
        group_id=1
        group_id2=5
        date_added=2015-10-14 15:01:39
        title=test one
        content=this is my test document number one. also checking search within phrases.
2. document=2, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015  
        id=2
        group_id=1
        group_id2=6
        date_added=2015-10-14 15:01:39
        title=test two
        content=this is my test document number two
3. document=4, weight=1, group_id=2, date_added=Wed Oct 14 15:01:39 2015  
        id=4
        group_id=2
        group_id2=8
        date_added=2015-10-14 15:01:39
        title=doc number four
        content=this is to test groups

words:  
1. 'test': 3 documents, 5 hits  

六、sphinx 启动脚本

vim /etc/init.d/sphinx

#!/bin/bash 
# sphinx: Startup script for Sphinx search 
# 
# chkconfig: 345 86 14 
# description:  This is a daemon for high performance full text / 
#               search of MySQL and PostgreSQL databases. / 
#               See http://www.sphinxsearch.com/ for more info. 
# 
# processname: searchd 
# pidfile: $sphinxlocation/var/log/searchd.pid 

# Source function library. 
. /etc/rc.d/init.d/functions 

processname=searchd  
servicename=sphinx  
username=sphinx  
sphinxlocation=/usr/local/sphinx  
pidfile=$sphinxlocation/var/log/searchd.pid  
searchd=$sphinxlocation/bin/searchd 

RETVAL=0 

PATH=$PATH:$sphinxlocation/bin 

start() {  
    echo -n $"Starting Sphinx daemon: " 
    daemon --user=$username --check $servicename $processname 
    RETVAL=$? 
    echo 
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename 
} 

stop() {  
    echo -n $"Stopping Sphinx daemon: " 

    $searchd --stop 
    #killproc -p $pidfile $servicename -TERM 
    RETVAL=$? 
    echo 
    if [ $RETVAL -eq 0 ]; then 
        rm -f /var/lock/subsys/$servicename 
        rm -f $pidfile 
    fi 
} 

# See how we were called. 
case "$1" in  
    start) 
        start 
        ;; 
    stop) 
        stop 
        ;; 
    status) 
        status $processname 
        RETVAL=$? 
        ;; 
    restart) 
        stop 
sleep 3  
        start 
        ;; 
    condrestart) 
        if [ -f /var/lock/subsys/$servicename ]; then 
            stop 
    sleep 3 
            start 
        fi 
        ;; 
    *) 
        echo $"Usage: $0 {start|stop|status|restart|condrestart}" 
        ;; 
esac  
exit $RETVAL  
chmod 755 /etc/init.d/sphinx  
chkconfig --add sphinx  
chkconfig --level 345 sphinx on   #开机启动  
service sphinx start              #运行  
service sphinx stop               #停止  
service sphinx restart            #重启  
service sphinx status             #查看是否运行