前言
前2天,要預研一下clucene, 準備和java程序對接(clucene來寫, java-lucene來讀[查詢]).
通過實驗和查資料,clucene-core-2.3.3.4產生的數據無法被新版的java-lucene6.x和java-lucene7.x讀取。可能是格式不兼容。
clucene雖然版本和java-lucene不兼容,也有自己的用場(比自己做索引查詢強多了).
如果java程序不介意用低版的lucene(能和clucene-core-2.3.3.4兼容的那種)或同意使用C封裝的jniDLL來進行查詢, clucene-core-2.3.3.4還是有用場的.
在查資料的過程中,發現能用的clucene編程資料特別的少。
官方文檔只給出一些API接口說明,沒給具體的API用法。
官方工程自帶的cl_demo中寫的那點代碼不實用(沒寫文件, demo自己寫,自己查詢時查不到), 失望.
非官方的clucene編程資料,大多都是基於clucene-core-2.3.3.4之前的版本, 參考價值很小。
最終,我是在clucene-core-2.3.3.4自帶的測試用例中扒出來的代碼.
如果沒有官方代碼做參照,誰知道怎麼玩:)
開源是好,在工程中可以各種找參考代碼,只是要用一些時間來讀代碼和做實驗。
開源工程的測試用例工程做的特別好,各種API的使用情況都測試到了,我們用的場景基本是測試用例中的一種或幾種的組合.
實驗工程和實驗材料下載點
實驗
實驗目標
安裝clucene庫和測試clucene用法
實驗材料
debian8.8(安裝了make, gcc, g++)
zlib-1.2.11.tar.gz
cmake-3.12.3.tar.gz
clucene-core-2.3.3.4.tar.gz
自產測試工程 test_clucene\src\case1
編譯過程
-
切到root用戶
-
安裝cmake
tar -xzvf ./cmake-3.12.3.tar.gz
cd cmake-3.12.3/
./configure
make
make install -
安裝zlib
tar -xzvf ./zlib-1.2.11.tar.gz
cd zlib-1.2.11/
cmake ./CMakeLists.txt
make
make install -
安裝clucene
tar -xzvf ./clucene-core-2.3.3.4.tar.gz
cd clucene-core-2.3.3.4
cmake ./CMakeLists.txt
make
make install
編譯官方測試工程
cd clucene-core-2.3.3.4
make cl_test
編譯好的程序爲./bin/cl_test
測試工程爲./src/test/
如果有需求,就在./src/test/中找參考代碼吧:)
自產的測試工程-運行效果
MAKE_FILE_MACRO__BIN_NAME = [test_clucene]
>> fn_test()
>> write_to_clucene()
create file system writer
close file system writer
================================================================================
>> query_from_clucene(k1, *)
================================================================================
lucene_hit cnt = 3
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1 is the value, k1 is the key>>
0. query = *, key = k1, value = v1 is the value, k1 is the key, score = 1.000000
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1_1>>
1. query = *, key = k1, value = v1_1, score = 0.042404
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1_2>>
2. query = *, key = k1, value = v1_2, score = 0.042404
================================================================================
>> query_from_clucene(url, *bing*)
================================================================================
lucene_hit cnt = 1
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://cn.bing.com>>
0. query = *bing*, key = url, value = http://cn.bing.com, score = 1.000000
================================================================================
>> query_from_clucene(url, csdn.)
================================================================================
lucene_hit cnt = 0
================================================================================
>> query_from_clucene(url, csdn)
================================================================================
lucene_hit cnt = 3
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://me.csdn.net>>
0. query = csdn, key = url, value = https://me.csdn.net, score = 0.846574
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://www.csdn.net/>>
1. query = csdn, key = url, value = http://www.csdn.net/, score = 0.846574
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://blog.csdn.net>>
2. query = csdn, key = url, value = https://blog.csdn.net, score = 0.846574
================================================================================
>> query_from_clucene(url, b??g)
================================================================================
lucene_hit cnt = 2
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://blog.csdn.net>>
0. query = b??g, key = url, value = https://blog.csdn.net, score = 0.421841
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://cn.bing.com>>
1. query = b??g, key = url, value = http://cn.bing.com, score = 0.421841
THE END
自產的測試工程-預覽
// @file main.cpp
// @brief 測試clucene文件的讀寫, 確定自己寫進入clucene的key-value可以自己讀出來
// @ref http://clucene.sourceforge.net/doc/html/
// @note 實驗環境 clucene-core-2.3.3.4
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include "CLucene.h" // clucene-core-2.3.3.4 只需要包含總的頭文件
#include "config/repl_tchar.h" // for linux tchar, clucene 用的是w_char
using namespace lucene::index;
using namespace lucene::analysis;
using namespace lucene::util;
using namespace lucene::document;
using namespace lucene::store;
using namespace lucene::queryParser;
using namespace lucene::search;
#ifndef SAFE_DELETE
#define SAFE_DELETE(p) \
if (NULL != (p)) { \
delete (p); \
(p) = NULL; \
}
#endif // #ifndef SAFE_DELETE
#define TITLE_LINE80 "================================================================================"
#define LINE80 "--------------------------------------------------------------------------------"
#if not defined(MYLOG_D)
#define MYLOG_D printf
#endif
#define MY_LUCENE_DATA_DIR "/home/dev/lu_dir/"
#define SAFE_DEL_CLUCENE_POINTER _CLDELETE
void init(const char* psz_log_owner_name);
void uninit();
void proc_sig_term(int num);
int fn_test();
bool add_doc_to_writer(IndexWriter* lucene_writer, const TCHAR* psz_key, const TCHAR* psz_value);
void write_to_clucene();
void query_from_clucene(const TCHAR* psz_key, const TCHAR* psz_query);
int main(int argc, char** argv)
{
char sz_buf[1024] = {'\0'};
#ifdef MAKE_FILE_MACRO__BIN_NAME
sprintf(sz_buf, "%s", MAKE_FILE_MACRO__BIN_NAME);
init(sz_buf);
MYLOG_D("MAKE_FILE_MACRO__BIN_NAME = [%s]\n", MAKE_FILE_MACRO__BIN_NAME);
#else
init(NULL);
#endif // #ifdef MAKE_FILE_MACRO__BIN_NAME
fn_test();
uninit();
MYLOG_D("THE END\n");
return EXIT_SUCCESS;
}
void uninit()
{
}
void proc_sig_term(int num)
{
MYLOG_D("SIGTERM = %d, num = %d\n", SIGTERM, num);
MYLOG_D("maybe can do some clean task before quit\n");
exit(1);
}
void init(const char* psz_log_owner_name)
{
int i = 0;
// daemon(0, 0);
// clear screen (print 25 empty line)
for (i = 0; i < 25; i++) {
MYLOG_D("\n");
}
signal(SIGTERM, proc_sig_term);
}
int fn_test()
{
MYLOG_D(">> fn_test()\n");
write_to_clucene();
// clucene的查詢統配符號只支持'?', '*'
// 查詢的內容只支持英文字母(a~z, A~Z)
// 查詢字符串中不能帶非英文字母(e.g. '.', '//', ':'), 否則查不到內容
query_from_clucene(_T("k1"), _T("*")); // 查詢條件不能爲空, 如果要查所有符合條件的vlaue, 輸入*
query_from_clucene(_T("url"), _T("*bing*"));
query_from_clucene(_T("url"), _T("csdn.")); // 這種不帶英文字母的查詢語句,是查不到內容的
query_from_clucene(_T("url"), _T("csdn"));
query_from_clucene(_T("url"), _T("b??g"));
/** run result
*/
return 0;
}
void write_to_clucene()
{
SimpleAnalyzer lucene_analyzer;
FSDirectory* lucene_fs_dir = NULL;
IndexWriter* lucene_writer = NULL;
MYLOG_D(">> write_to_clucene()\n");
try {
lucene_fs_dir = FSDirectory::getDirectory(MY_LUCENE_DATA_DIR);
MYLOG_D("create file system writer\n");
lucene_writer = _CLNEW IndexWriter(lucene_fs_dir, &lucene_analyzer, true);
lucene_writer->setMaxFieldLength(IndexWriter::DEFAULT_MAX_FIELD_LENGTH);
add_doc_to_writer(lucene_writer, _T("url"), _T("https://me.csdn.net"));
add_doc_to_writer(lucene_writer, _T("url"), _T("http://www.csdn.net/"));
add_doc_to_writer(lucene_writer, _T("url"), _T("https://blog.csdn.net"));
add_doc_to_writer(lucene_writer, _T("url"), _T("https://www.baidu.com/"));
add_doc_to_writer(lucene_writer, _T("url"), _T("http://cn.bing.com"));
add_doc_to_writer(lucene_writer, _T("k1"), _T("v1 is the value, k1 is the key"));
add_doc_to_writer(lucene_writer, _T("k1"), _T("v1_1"));
add_doc_to_writer(lucene_writer, _T("k1"), _T("v1_2"));
lucene_writer->close();
MYLOG_D("close file system writer\n");
}
catch (CLuceneError e) {
MYLOG_D("catch clucene error : %s\n", e.what());
}
SAFE_DEL_CLUCENE_POINTER(lucene_writer);
SAFE_DEL_CLUCENE_POINTER(lucene_fs_dir);
}
bool add_doc_to_writer(IndexWriter* lucene_writer, const TCHAR* psz_key, const TCHAR* psz_value)
{
bool b_rc = false;
Document* lucene_doc = NULL;
do {
if ((NULL == lucene_writer) || (NULL == psz_key) || (NULL == psz_value)) {
break;
}
lucene_doc = _CLNEW Document();
if (NULL == lucene_doc) {
break;
}
lucene_doc->add(*_CLNEW Field(psz_key, psz_value, Field::STORE_YES | Field::INDEX_TOKENIZED));
lucene_writer->addDocument(lucene_doc);
SAFE_DEL_CLUCENE_POINTER(lucene_doc);
b_rc = true;
} while (0);
return b_rc;
}
void query_from_clucene(const TCHAR* psz_key, const TCHAR* psz_query)
{
std::wstring str1 = L"";
float f_score = 0.0f;
uint32_t i = 0;
Term* lucene_term = NULL;
Query* lucene_query = NULL;
Hits* lucene_hits = NULL;
FSDirectory* clucene_dir = NULL;
IndexReader* clucene_reader = NULL;
IndexSearcher* clucene_searcher = NULL;
MYLOG_D("\n\n");
MYLOG_D("%s\n", TITLE_LINE80);
MYLOG_D(">> query_from_clucene(%ls, %ls)\n", psz_key, psz_query);
MYLOG_D("%s\n", TITLE_LINE80);
try {
lucene_term = _CLNEW Term(psz_key, psz_query);
lucene_query = _CLNEW WildcardQuery(lucene_term);
clucene_dir = FSDirectory::getDirectory(MY_LUCENE_DATA_DIR);
clucene_reader = IndexReader::open(clucene_dir);
clucene_searcher = _CLNEW IndexSearcher(clucene_reader);
lucene_hits = clucene_searcher->search(lucene_query);
MYLOG_D("lucene_hit cnt = %lu\n", lucene_hits->length());
for (i = 0; i < lucene_hits->length(); i++ ) {
MYLOG_D("%s\n", LINE80);
Document& lucene_doc = lucene_hits->doc(i);
MYLOG_D("lucene_doc->toString() = %ls\n", lucene_doc.toString());
str1 = lucene_doc.get(psz_key);
f_score = lucene_hits->score(i);
MYLOG_D("%d. query = %ls, key = %ls, value = %ls, score = %f\n",
i, psz_query, psz_key, str1.c_str(), f_score); // 打印寬字符的例子(使用printf + %ls)
}
}
catch (CLuceneError e) {
MYLOG_D("catch clucene error : %s\n", e.what());
}
//free resource
SAFE_DEL_CLUCENE_POINTER(lucene_term);
SAFE_DEL_CLUCENE_POINTER(lucene_hits);
SAFE_DEL_CLUCENE_POINTER(lucene_query);
SAFE_DEL_CLUCENE_POINTER(clucene_searcher);
SAFE_DEL_CLUCENE_POINTER(clucene_reader);
SAFE_DEL_CLUCENE_POINTER(clucene_dir);
}
# ==============================================================================
# @file makefile
# ==============================================================================
# @note
# howto build project
# make BIN_NAME="bin_name_by_you_want" rebuild
MY_MAKE_FILE_PATH_NAME = $(MAKEFILE_LIST)
# macro from Makefile command line
# BIN_NAME
# macro to C project
MAKE_FILE_MACRO__BIN_NAME="make_file_macro__bin_name"
# var define on Makefile
BIN = output_not_give_bin_name
IS_BUILD_TYPE_VALID = 0
ifdef BIN_NAME
IS_BUILD_TYPE_VALID = 1
BIN = $(BIN_NAME)
MAKE_FILE_MACRO__BIN_NAME=$(BIN_NAME)
else
IS_BUILD_TYPE_VALID = 0
endif
LINE80 = --------------------------------------------------------------------------------
# CC = g++ -std=c++98
CC = g++
# -Werror is "warning as error"
CFLAGS = -Wall -Werror -g
INC = -I. -I./clucene-core-2.3.3.4/src/core/ -I./clucene-core-2.3.3.4/src/shared/CLucene/
LIBPATH = -L/usr/lib/ -L/usr/local/lib/
ifeq (1, $(IS_BUILD_TYPE_VALID))
LIBS = -lstdc++ -pthread -lclucene-core -lclucene-shared
else
LIBS =
endif
DEPEND_CODE_DIR = ../common/ \
DEPEND_CODE_SRC = $(shell find $(DEPEND_CODE_DIR) -name '*.cpp')
DEPEND_CODE_OBJ = $(DEPEND_CODE_SRC:.cpp=.o)
ROOT_CODE_SRC = $(shell find ./ -name '*.cpp')
ROOT_CODE_OBJ = $(ROOT_CODE_SRC:.cpp=.o)
SUB_CODE_DIR = ./empty_dir
SUB_CODE_SRC = $(shell find $(SUB_CODE_DIR) -name '*.cpp')
SUB_CODE_OBJ = $(SUB_CODE_SRC:.cpp=.o)
.PHONY: help
help:
clear
@echo "usage:"
@echo
@echo "build project by given bin name"
@echo "make BIN_NAME=\"bin_name_by_you_want\" rebuild"
@echo
.PHONY: clean
clean:
clear
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo
@echo "make clean begin"
@echo $(LINE80)
@echo "@file $(MY_MAKE_FILE_PATH_NAME)"
@echo "IS_BUILD_TYPE_VALID = $(IS_BUILD_TYPE_VALID)"
@echo "BIN = $(BIN)"
@echo $(LINE80)
rm -f $(ROOT_CODE_OBJ) $(DEPEND_CODE_OBJ) $(SUB_CODE_OBJ)
ifeq (1, $(IS_BUILD_TYPE_VALID))
rm -f ./$(BIN)
endif
@echo "make clean over"
.PHONY: all
all:$(BIN)
@echo $(LINE80)
@echo make all
chmod 777 ./$(BIN)
find . -name "$(BIN)"
$(BIN) : $(ROOT_CODE_OBJ) $(DEPEND_CODE_OBJ) $(SUB_CODE_OBJ)
$(CC) $(CFLAGS) -o $@ $^ $(SHLIBS) $(INC) $(LIBPATH) $(LIBS)
.cpp.o:
$(CC) -c $(CFLAGS) -DMAKE_FILE_MACRO__BIN_NAME="\"$(MAKE_FILE_MACRO__BIN_NAME)\"" $^ -o $@ $(INC) $(LIBPATH) $(LIBS)
.PHONY: rebuild
rebuild:
make -f $(MY_MAKE_FILE_PATH_NAME) clean
ifeq (1, $(IS_BUILD_TYPE_VALID))
@echo $(LINE80)
make -f $(MY_MAKE_FILE_PATH_NAME) all
chmod 775 ./$(BIN)
ldd ./$(BIN)
else
@echo $(LINE80)
@echo "error : Makefile command line input error, please see help"
@echo "please run => make help"
@echo $(LINE80)
endif
編譯自產測試工程用的shell腳本
#!/bin/bash
# ==============================================================================
# @file build_all_project.sh
# ==============================================================================
make BIN_NAME="test_clucene" rebuild