[solr] - suggestion

前文使用了SpellCheck做了個自動完成模擬(Solr SpellCheck),使用第一種SpellCheck方式做auto-complete,是基於動態代碼方式建立內容,下面方式可通過讀文件方式建立內容,並有點擊率排序。

 

1、在mycore/conf目錄下新建一個dictionary.txt文件(UTF-8格式),內容爲:

複製代碼
# sample dict 
cpu intel I7    1.0
cpu AMD 5000+    2.0
中央處理器 英特爾    1.0
中央處理器 AMD    2.0
中央空調 海爾 1匹    1.0
中央空調 海爾 1.5匹    2.0
中央空調 海爾 2匹    3.0
中央空調 格力 1匹    4.0
中央空調 格力 1.5匹    5.0
中央空調 格力 2匹    6.0
中央空調 美的 1匹    7.0
中央空調 美的 1.5匹    8.0
中央空調 美的 2匹    9.0
中國中央政府    1.0
中國中央銀行    2.0
中國中央人民銀行    3.0
啓信有限公司    1.0
啓信科技有限公司    2.0
複製代碼

注意上面的“1.0、2.0、3.0”,這就是點擊率。以Tab字符(\t)隔開與前面的文字,否則視爲普通文本。

 

2、打開solrconfig.xml文件,加入節點到<config />當中:

複製代碼
    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="name">file</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>  
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
        <!-- 下面這個field名字指的是拼寫檢查的依據,也就是說要根據哪個Field來檢查用戶輸入。 -->
        <str name="field">content</str>
        <str name="combineWords">true</str>
        <str name="breakWords">true</str>
        <!-- 自動完成提示內容文件 -->
        <str name="sourceLocation">dictionary.txt</str>
        <!-- 自動完成提示索引目錄,如果不寫默認使用內存模式RAMDirectory -->
        <str name="spellcheckIndexDir">./spellchecker</str>
        <!-- 何時創建拼寫索引:buildOnCommit/buildOnOptimize -->  
        <str name="buildOnCommit">true</str>
      </lst>
    </searchComponent>
    <requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">
      <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">file</str>
        <!-- 提示查詢的字符數量 -->
        <str name="spellcheck.count">20</str>
        <!-- 使用點擊率排序 -->
        <str name="spellcheck.onlyMorePopular">true</str>
      </lst>
      <arr name="last-components">
        <str>spellcheck</str>
      </arr>
    </requestHandler>
複製代碼

在<searchComponent />中關鍵這句:

<str name="sourceLocation">dictionary.txt</str>

 

3、打開瀏覽器地址欄輸入:

http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true

結果爲:

 

4、在瀏覽器測試,輸入地址:

http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0

 

5、使用代碼測試:

複製代碼
package com.my.solr;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Map;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion;

import com.my.entity.Item;

public class TestSolr {

    public static void main(String[] args) throws IOException, SolrServerException {
        String url = "http://localhost:8899/solr/mycore";
        HttpSolrServer core = new HttpSolrServer(url);
        core.setMaxRetries(1);
        core.setConnectionTimeout(5000);
        core.setParser(new XMLResponseParser()); // binary parser is used by default
        core.setSoTimeout(1000); // socket read timeout
        core.setDefaultMaxConnectionsPerHost(100);
        core.setMaxTotalConnections(100);
        core.setFollowRedirects(false); // defaults to false
        core.setAllowCompression(true);

        // ------------------------------------------------------
        // search
        // ------------------------------------------------------
        SolrQuery query = new SolrQuery();
        String token = "中央";
        query.set("qt", "/spellcheck");
        query.set("q", token);
        query.set("spellcheck", "on");
        query.set("spellcheck.build", "true");
        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.count", "100");
        query.set("spellcheck.alternativeTermCount", "4");
        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.extendedResults", "true");
        query.set("spellcheck.maxResultsForSuggest", "5");

        query.set("spellcheck.collate", "true");
        query.set("spellcheck.collateExtendedResults", "true");
        query.set("spellcheck.maxCollationTries", "5");
        query.set("spellcheck.maxCollations", "3");

        QueryResponse response = null;

        try {
            response = core.query(query);
            System.out.println("查詢耗時:" + response.getQTime());
        } catch (SolrServerException e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
        } catch (Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
        } finally {
            core.shutdown();
        }

        SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
        if (spellCheckResponse != null) {
            List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();
            for (Suggestion suggestion : suggestionList) {
                System.out.println("Suggestions NumFound: " + suggestion.getNumFound());
                System.out.println("Token: " + suggestion.getToken());
                System.out.print("Suggested: ");
                List<String> suggestedWordList = suggestion.getAlternatives();
                for (String word : suggestedWordList) {
                    System.out.println(word + ", ");
                }
                System.out.println();
            }
            System.out.println();
            Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();
            for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {
                System.out.println("suggestionName: " + entry.getKey());
                Suggestion suggestion = entry.getValue();
                System.out.println("NumFound: " + suggestion.getNumFound());
                System.out.println("Token: " + suggestion.getToken());
                System.out.print("suggested: ");

                List<String> suggestedList = suggestion.getAlternatives();
                for (String suggestedWord : suggestedList) {
                    System.out.print(suggestedWord + ", ");
                }
                System.out.println("\n\n");
            }

            Suggestion suggestion = spellCheckResponse.getSuggestion(token);
            System.out.println("NumFound: " + suggestion.getNumFound());
            System.out.println("Token: " + suggestion.getToken());
            System.out.print("suggested: ");
            List<String> suggestedList = suggestion.getAlternatives();
            for (String suggestedWord : suggestedList) {
                System.out.print(suggestedWord + ", ");
            }
            System.out.println("\n\n");

            System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));
            System.out.println("\n\n");

            List<Collation> collatedList = spellCheckResponse.getCollatedResults();
            if (collatedList != null) {
                for (Collation collation : collatedList) {
                    System.out.println("collated query String: " + collation.getCollationQueryString());
                    System.out.println("collation Num: " + collation.getNumberOfHits());
                    List<Correction> correctionList = collation.getMisspellingsAndCorrections();
                    for (Correction correction : correctionList) {
                        System.out.println("original: " + correction.getOriginal());
                        System.out.println("correction: " + correction.getCorrection());
                    }
                    System.out.println();
                }
            }
            System.out.println();
            System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());
            System.out.println();
        }

        System.out.println("查詢耗時:" + response.getQTime());
    }
}
複製代碼

輸出結果:

這裏已經根據點擊率排好序了。

 


 

 

上面dictionary.txt中有一個“啓信”,這不是一個分詞,所以如果查詢“啓”字,是不會有結果的。

加入用戶自定義分詞方法:

1、打開solr web的目錄webapps\solr\WEB-INF\classes,新建一個etc.dic文本文件,內容:

啓信

編輯IKAnalyzer.cfg.xml文件:

複製代碼
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
    <comment>IK Analyzer 擴展配置</comment>
    <!--用戶可以在這裏配置自己的擴展字典-->
    <entry key="ext_dict">ext.dic;</entry> 
    
    <!--用戶可以在這裏配置自己的擴展停止詞字典-->
    <entry key="ext_stopwords">stopword.dic;</entry> 
    
</properties>
複製代碼

保存,重啓tomcat。

地址欄輸入:

http://localhost:8899/solr/mycore/spellcheck?q=啓&rows=0

結果:

 

使用代碼方式亦同。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章