linux - clucene的編譯和基本使用

前言

前2天,要預研一下clucene, 準備和java程序對接(clucene來寫, java-lucene來讀[查詢]).
通過實驗和查資料,clucene-core-2.3.3.4產生的數據無法被新版的java-lucene6.x和java-lucene7.x讀取。可能是格式不兼容。

clucene雖然版本和java-lucene不兼容,也有自己的用場(比自己做索引查詢強多了).
如果java程序不介意用低版的lucene(能和clucene-core-2.3.3.4兼容的那種)或同意使用C封裝的jniDLL來進行查詢, clucene-core-2.3.3.4還是有用場的.

在查資料的過程中,發現能用的clucene編程資料特別的少。
官方文檔只給出一些API接口說明,沒給具體的API用法。
官方工程自帶的cl_demo中寫的那點代碼不實用(沒寫文件, demo自己寫,自己查詢時查不到), 失望.
非官方的clucene編程資料,大多都是基於clucene-core-2.3.3.4之前的版本, 參考價值很小。

最終,我是在clucene-core-2.3.3.4自帶的測試用例中扒出來的代碼.
如果沒有官方代碼做參照,誰知道怎麼玩:)

開源是好,在工程中可以各種找參考代碼,只是要用一些時間來讀代碼和做實驗。
開源工程的測試用例工程做的特別好,各種API的使用情況都測試到了,我們用的場景基本是測試用例中的一種或幾種的組合.

實驗工程和實驗材料下載點

src_test_clucene.7z

實驗

實驗目標

安裝clucene庫和測試clucene用法

實驗材料

debian8.8(安裝了make, gcc, g++)
zlib-1.2.11.tar.gz
cmake-3.12.3.tar.gz
clucene-core-2.3.3.4.tar.gz
自產測試工程 test_clucene\src\case1

編譯過程

  • 切到root用戶

  • 安裝cmake
    tar -xzvf ./cmake-3.12.3.tar.gz
    cd cmake-3.12.3/
    ./configure
    make
    make install

  • 安裝zlib
    tar -xzvf ./zlib-1.2.11.tar.gz
    cd zlib-1.2.11/
    cmake ./CMakeLists.txt
    make
    make install

  • 安裝clucene
    tar -xzvf ./clucene-core-2.3.3.4.tar.gz
    cd clucene-core-2.3.3.4
    cmake ./CMakeLists.txt
    make
    make install

編譯官方測試工程

cd clucene-core-2.3.3.4
make cl_test
編譯好的程序爲./bin/cl_test
測試工程爲./src/test/
如果有需求,就在./src/test/中找參考代碼吧:)

自產的測試工程-運行效果

MAKE_FILE_MACRO__BIN_NAME = [test_clucene]
>> fn_test()
>> write_to_clucene()
create file system writer
close file system writer


================================================================================
>> query_from_clucene(k1, *)
================================================================================
lucene_hit cnt = 3
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1 is the value, k1 is the key>>
0. query = *, key = k1, value = v1 is the value, k1 is the key, score = 1.000000
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1_1>>
1. query = *, key = k1, value = v1_1, score = 0.042404
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<k1:v1_2>>
2. query = *, key = k1, value = v1_2, score = 0.042404


================================================================================
>> query_from_clucene(url, *bing*)
================================================================================
lucene_hit cnt = 1
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://cn.bing.com>>
0. query = *bing*, key = url, value = http://cn.bing.com, score = 1.000000


================================================================================
>> query_from_clucene(url, csdn.)
================================================================================
lucene_hit cnt = 0


================================================================================
>> query_from_clucene(url, csdn)
================================================================================
lucene_hit cnt = 3
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://me.csdn.net>>
0. query = csdn, key = url, value = https://me.csdn.net, score = 0.846574
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://www.csdn.net/>>
1. query = csdn, key = url, value = http://www.csdn.net/, score = 0.846574
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://blog.csdn.net>>
2. query = csdn, key = url, value = https://blog.csdn.net, score = 0.846574


================================================================================
>> query_from_clucene(url, b??g)
================================================================================
lucene_hit cnt = 2
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:https://blog.csdn.net>>
0. query = b??g, key = url, value = https://blog.csdn.net, score = 0.421841
--------------------------------------------------------------------------------
lucene_doc->toString() = Document< stored/uncompressed,indexed,tokenized<url:http://cn.bing.com>>
1. query = b??g, key = url, value = http://cn.bing.com, score = 0.421841
THE END

自產的測試工程-預覽

// @file main.cpp
// @brief 測試clucene文件的讀寫, 確定自己寫進入clucene的key-value可以自己讀出來
// @ref http://clucene.sourceforge.net/doc/html/
// @note 實驗環境 clucene-core-2.3.3.4

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>

#include "CLucene.h" // clucene-core-2.3.3.4 只需要包含總的頭文件
#include "config/repl_tchar.h" // for linux tchar, clucene 用的是w_char

using namespace lucene::index;
using namespace lucene::analysis;
using namespace lucene::util;
using namespace lucene::document;

using namespace lucene::store;

using namespace lucene::queryParser;
using namespace lucene::search;

#ifndef SAFE_DELETE
#define SAFE_DELETE(p) \
	if (NULL != (p)) { \
		delete (p); \
		(p) = NULL; \
	}
#endif // #ifndef SAFE_DELETE

#define TITLE_LINE80 "================================================================================"
#define LINE80 "--------------------------------------------------------------------------------"

#if not defined(MYLOG_D)
#define MYLOG_D printf
#endif

#define MY_LUCENE_DATA_DIR "/home/dev/lu_dir/"
#define SAFE_DEL_CLUCENE_POINTER _CLDELETE

void init(const char* psz_log_owner_name);
void uninit();
void proc_sig_term(int num);
int fn_test();

bool add_doc_to_writer(IndexWriter* lucene_writer, const TCHAR* psz_key, const TCHAR* psz_value);
void write_to_clucene();
void query_from_clucene(const TCHAR* psz_key, const TCHAR* psz_query);

int main(int argc, char** argv)
{
	char sz_buf[1024] = {'\0'};
	
#ifdef MAKE_FILE_MACRO__BIN_NAME
	sprintf(sz_buf, "%s", MAKE_FILE_MACRO__BIN_NAME);
	init(sz_buf);
	MYLOG_D("MAKE_FILE_MACRO__BIN_NAME = [%s]\n", MAKE_FILE_MACRO__BIN_NAME);
#else
	init(NULL);
#endif // #ifdef MAKE_FILE_MACRO__BIN_NAME
	
	fn_test();
	uninit();

	MYLOG_D("THE END\n");
    return EXIT_SUCCESS;
}

void uninit()
{
}

void proc_sig_term(int num)
{
	MYLOG_D("SIGTERM = %d, num = %d\n", SIGTERM, num);
	MYLOG_D("maybe can do some clean task before quit\n");
	exit(1);	
}

void init(const char* psz_log_owner_name)
{
	int i = 0;

	// daemon(0, 0);

	// clear screen (print 25 empty line)
	for (i = 0; i < 25; i++) {
		MYLOG_D("\n");
	}

	signal(SIGTERM, proc_sig_term);
}

int fn_test()
{
	MYLOG_D(">> fn_test()\n");

	write_to_clucene();

	// clucene的查詢統配符號只支持'?', '*'
	// 查詢的內容只支持英文字母(a~z, A~Z)
	// 查詢字符串中不能帶非英文字母(e.g. '.', '//', ':'), 否則查不到內容
	query_from_clucene(_T("k1"), _T("*")); // 查詢條件不能爲空, 如果要查所有符合條件的vlaue, 輸入*
	query_from_clucene(_T("url"), _T("*bing*"));
	query_from_clucene(_T("url"), _T("csdn.")); // 這種不帶英文字母的查詢語句,是查不到內容的
	query_from_clucene(_T("url"), _T("csdn"));
	query_from_clucene(_T("url"), _T("b??g"));

	/** run result
	*/

    return 0;
}

void write_to_clucene()
{
	SimpleAnalyzer lucene_analyzer;
	FSDirectory* lucene_fs_dir = NULL;
	IndexWriter* lucene_writer = NULL;

	MYLOG_D(">> write_to_clucene()\n");
	try {
		lucene_fs_dir = FSDirectory::getDirectory(MY_LUCENE_DATA_DIR);
		
		MYLOG_D("create file system writer\n");
		lucene_writer = _CLNEW IndexWriter(lucene_fs_dir, &lucene_analyzer, true);
		lucene_writer->setMaxFieldLength(IndexWriter::DEFAULT_MAX_FIELD_LENGTH);

		add_doc_to_writer(lucene_writer, _T("url"), _T("https://me.csdn.net"));
		add_doc_to_writer(lucene_writer, _T("url"), _T("http://www.csdn.net/"));
		add_doc_to_writer(lucene_writer, _T("url"), _T("https://blog.csdn.net"));
		add_doc_to_writer(lucene_writer, _T("url"), _T("https://www.baidu.com/"));
		add_doc_to_writer(lucene_writer, _T("url"), _T("http://cn.bing.com"));
		
		add_doc_to_writer(lucene_writer, _T("k1"), _T("v1 is the value, k1 is the key"));
		add_doc_to_writer(lucene_writer, _T("k1"), _T("v1_1"));
		add_doc_to_writer(lucene_writer, _T("k1"), _T("v1_2"));

		lucene_writer->close();
		MYLOG_D("close file system writer\n");
	}

	catch (CLuceneError e) {
		MYLOG_D("catch clucene error : %s\n", e.what());
	}

	SAFE_DEL_CLUCENE_POINTER(lucene_writer);
	SAFE_DEL_CLUCENE_POINTER(lucene_fs_dir);
}

bool add_doc_to_writer(IndexWriter* lucene_writer, const TCHAR* psz_key, const TCHAR* psz_value)
{
	bool b_rc = false;
	Document* lucene_doc = NULL;

	do {
		if ((NULL == lucene_writer) || (NULL == psz_key) || (NULL == psz_value)) {
			break;
		}

		lucene_doc = _CLNEW Document();
		if (NULL == lucene_doc) {
			break;
		}
		
		lucene_doc->add(*_CLNEW Field(psz_key, psz_value, Field::STORE_YES | Field::INDEX_TOKENIZED));
		lucene_writer->addDocument(lucene_doc);
		SAFE_DEL_CLUCENE_POINTER(lucene_doc);

		b_rc = true;
	} while (0);

	return b_rc;
}

void query_from_clucene(const TCHAR* psz_key, const TCHAR* psz_query)
{
	std::wstring str1 = L"";
	float f_score = 0.0f;
	uint32_t i = 0;
	Term* lucene_term = NULL;
	Query* lucene_query = NULL;
	Hits* lucene_hits = NULL;

	FSDirectory* clucene_dir = NULL;
	IndexReader* clucene_reader = NULL;
	IndexSearcher* clucene_searcher = NULL;

	MYLOG_D("\n\n");
	MYLOG_D("%s\n", TITLE_LINE80);
	MYLOG_D(">> query_from_clucene(%ls, %ls)\n", psz_key, psz_query);
	MYLOG_D("%s\n", TITLE_LINE80);

	try {
		lucene_term = _CLNEW Term(psz_key, psz_query);
    	lucene_query = _CLNEW WildcardQuery(lucene_term);

		clucene_dir = FSDirectory::getDirectory(MY_LUCENE_DATA_DIR);
		clucene_reader = IndexReader::open(clucene_dir);
    	clucene_searcher = _CLNEW IndexSearcher(clucene_reader);

		lucene_hits = clucene_searcher->search(lucene_query);
	
		MYLOG_D("lucene_hit cnt = %lu\n", lucene_hits->length());
		for (i = 0; i < lucene_hits->length(); i++ ) {
			MYLOG_D("%s\n", LINE80);
			Document& lucene_doc = lucene_hits->doc(i);
			MYLOG_D("lucene_doc->toString() = %ls\n", lucene_doc.toString());
			
			str1 = lucene_doc.get(psz_key);
			f_score = lucene_hits->score(i);
			MYLOG_D("%d. query = %ls, key = %ls, value = %ls, score = %f\n", 
				i, psz_query, psz_key, str1.c_str(), f_score); // 打印寬字符的例子(使用printf + %ls)
		}
	}

	catch (CLuceneError e) {
		MYLOG_D("catch clucene error : %s\n", e.what());
	}

	//free resource
	SAFE_DEL_CLUCENE_POINTER(lucene_term);
	SAFE_DEL_CLUCENE_POINTER(lucene_hits);
	SAFE_DEL_CLUCENE_POINTER(lucene_query);
	SAFE_DEL_CLUCENE_POINTER(clucene_searcher);
	SAFE_DEL_CLUCENE_POINTER(clucene_reader);
	SAFE_DEL_CLUCENE_POINTER(clucene_dir);
}


# ==============================================================================
# @file makefile
# ==============================================================================
# @note 
# howto build project
# 		make BIN_NAME="bin_name_by_you_want" rebuild

MY_MAKE_FILE_PATH_NAME = $(MAKEFILE_LIST)

# macro from Makefile command line
# BIN_NAME

# macro to C project
MAKE_FILE_MACRO__BIN_NAME="make_file_macro__bin_name"

# var define on Makefile
BIN = output_not_give_bin_name
IS_BUILD_TYPE_VALID = 0

ifdef BIN_NAME
   	IS_BUILD_TYPE_VALID = 1
	BIN = $(BIN_NAME)
	MAKE_FILE_MACRO__BIN_NAME=$(BIN_NAME)
else
    IS_BUILD_TYPE_VALID = 0
endif

LINE80 = --------------------------------------------------------------------------------

# CC = g++ -std=c++98
CC = g++

# -Werror is "warning as error"
CFLAGS = -Wall -Werror -g

INC = -I. -I./clucene-core-2.3.3.4/src/core/ -I./clucene-core-2.3.3.4/src/shared/CLucene/
LIBPATH = -L/usr/lib/ -L/usr/local/lib/

ifeq (1, $(IS_BUILD_TYPE_VALID))
	LIBS = -lstdc++ -pthread -lclucene-core -lclucene-shared
else
	LIBS =
endif

DEPEND_CODE_DIR = ../common/ \

DEPEND_CODE_SRC = $(shell find $(DEPEND_CODE_DIR) -name '*.cpp')
DEPEND_CODE_OBJ = $(DEPEND_CODE_SRC:.cpp=.o)

ROOT_CODE_SRC = $(shell find ./ -name '*.cpp')
ROOT_CODE_OBJ = $(ROOT_CODE_SRC:.cpp=.o)

SUB_CODE_DIR = ./empty_dir
SUB_CODE_SRC = $(shell find $(SUB_CODE_DIR) -name '*.cpp')
SUB_CODE_OBJ = $(SUB_CODE_SRC:.cpp=.o)

.PHONY: help
help:
	clear
	@echo "usage:"
	@echo
	@echo "build project by given bin name"
	@echo "make BIN_NAME=\"bin_name_by_you_want\" rebuild"
	@echo

.PHONY: clean
clean:
	clear

	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo

	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo
	@echo

	@echo
	@echo
	@echo
	@echo
	@echo
	
	@echo "make clean begin"
	@echo $(LINE80)

	@echo "@file $(MY_MAKE_FILE_PATH_NAME)"
	@echo "IS_BUILD_TYPE_VALID = $(IS_BUILD_TYPE_VALID)"
	@echo "BIN = $(BIN)"

	@echo $(LINE80)

	rm -f $(ROOT_CODE_OBJ) $(DEPEND_CODE_OBJ) $(SUB_CODE_OBJ)
	
ifeq (1, $(IS_BUILD_TYPE_VALID))
	rm -f ./$(BIN)
endif

	@echo "make clean over"

.PHONY: all
all:$(BIN)
	@echo $(LINE80)
	@echo make all
	chmod 777 ./$(BIN)
	find . -name "$(BIN)"

$(BIN) : $(ROOT_CODE_OBJ) $(DEPEND_CODE_OBJ) $(SUB_CODE_OBJ)
	$(CC) $(CFLAGS) -o $@ $^ $(SHLIBS) $(INC) $(LIBPATH) $(LIBS)

.cpp.o:
	$(CC) -c $(CFLAGS) -DMAKE_FILE_MACRO__BIN_NAME="\"$(MAKE_FILE_MACRO__BIN_NAME)\"" $^ -o $@ $(INC) $(LIBPATH) $(LIBS)

.PHONY: rebuild
rebuild:
	make -f $(MY_MAKE_FILE_PATH_NAME) clean

ifeq (1, $(IS_BUILD_TYPE_VALID))
	@echo $(LINE80)
	make -f $(MY_MAKE_FILE_PATH_NAME) all
	chmod 775 ./$(BIN)
	ldd ./$(BIN)
else
	@echo $(LINE80)
	@echo "error : Makefile command line input error, please see help"	
	@echo "please run => make help"	
	@echo $(LINE80)
endif
	


編譯自產測試工程用的shell腳本

#!/bin/bash
# ==============================================================================
# @file build_all_project.sh
# ==============================================================================

make BIN_NAME="test_clucene" rebuild


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章