Elasticsearch筆記(十四) Elasticsearch工具類 支持樹形結構

1. 前言

最近做的幾個項目用ES作爲數據庫,一個項目用的開源的jest作爲ES工具,感覺用的還可以,但是它好久不更新了。還有一個項目的工具類是自己寫的,寫的很粗糙,老大的意思要支持ES5.6和ES6.8這兩個版本。後來我就用了ES5.6的Low Level Java API實現了常用CRUD方法。後來體驗了下Spring Data Elasticsearch,感覺這個框架體驗極好,API非常豐富,Spring出品的果然牛。還有因爲我常用Spring Data JPA,所以上手有很熟悉的感覺。

因爲我技術比較菜,幾年Java 開發工作中,也就CRUD,所以看Spring Data ES的源碼很吃力,反正看不懂。所以想自己再寫個簡單的ES工具類,全當熟悉下ES的Java API,和優雅的Spring Data ES比,相差十萬八千里。

2. 目標

恰巧我自己寫的第一個Java功能是一個ORM工具類,就是根據實體類,產生CRUD方法,所以對Java的泛型和反射還有一點點印象,所以寫這個ES的ORM工具又有了當年熟悉的味道。先定好這次的幾個目標:

2.1 目標:基於實體類的CRUD

看了網上那麼多JPA和Mybatis哪個好的文章,我感覺這些爭吵毫無意義,適合自己的就是好的。我個人喜歡JPA的那種面向對象的調調,它也提供了手寫SQL查詢功能。所以我可能要實現的是如下風格的接口:

 T sava(T t);
 
 T findById(String id);
 
long count(QueryBuilder queryBuilder);

2.2 目標:支持查詢ES中樹形結構數據

這裏我說的ES中樹形結構,它不是ES自帶的父子文檔,我感覺用ES的parent語法挺難用,也許是我太菜,不太會用ES的父子文檔。
這裏我說的ES的樹形結構參考我上一篇博客ES保存樹形結構 結合Spring Data Elasticsearch
這裏順便提一下,樹形結構最好一個節點只有一個父親節點,一個節點多父親的情況在工作中確實會遇到,但是那個坑很多,維護起來很麻煩。所以對我個人而言,拒絕多父親的樹形結構。

3. 問題和解決

3.1 問題:獲取泛型T的class,避免顯示傳入class

泛型T已經傳來了,獲取T的class,會讓代碼更加優雅。
不然你看下網上別人的代碼,還得傳一個clazz,是不是特別讓人不爽。

T getById(M id, Class<T> clazz)

boolean exists(M id, Class<T> clazz)

long count(QueryBuilder queryBuilder, Class<T> clazz);

List<T> searchMore(QueryBuilder queryBuilder,int limitSize, Class<T> clazz);

3.1 解決:抄Spring Data的作業

Spring Data ES裏有段代碼,不明覺厲。雖然我看不懂,但大概理解爲子類(AbstractElasticsearchRepository<T, ID>)實現接口(ElasticsearchRepository<T, ID>),在子類中獲取父親的T的類型,這段我也就抄作業抄一半。

private ParameterizedType resolveReturnedClassFromGenericType(Class<?> clazz) {

	Object genericSuperclass = clazz.getGenericSuperclass();
	if (genericSuperclass instanceof ParameterizedType) {
		ParameterizedType parameterizedType = (ParameterizedType) genericSuperclass;
		Type rawtype = parameterizedType.getRawType();
		if (SimpleElasticsearchRepository.class.equals(rawtype)) {
			return parameterizedType;
		}
	}

	return resolveReturnedClassFromGenericType(clazz.getSuperclass());
}

3.2 問題:樹形結構如何設計

對於一個樹形結構數據,我們常用到如下場景:

  • 根據Id,獲取其直接兒子節點
  • 根據Id,獲取其所有子孫節點,例如子孫節點總個數
  • 根據Id,獲取其所有祖先節點
  • 節點變更父親,修改該節點所以子孫節點的path信息
  • 刪除一個節點,判斷其下是否有子孫,有則不允許刪除

3.2 解決:利用ES的nested類型,記錄祖先節點ID

參考ES保存樹形結構 結合Spring Data Elasticsearch,這裏我給下ES的mapping和例子數據

PUT /pigg_tree/_mapping/_doc
{
    "properties":{
        "id":{
            "type":"keyword"
        },
        "level":{
            "type":"keyword"
        },
        "name":{
            "type":"keyword"
        },
        "parentId":{
            "type":"keyword"
        },
        "path":{
            "type":"nested",
            "properties":{
                "id":{
                    "type":"keyword"
                },
                "level":{
                    "type":"keyword"
                }
            }
        }
    }
}
 {
        "_index" : "pigg_tree",
        "_type" : "_doc",
        "_id" : "5ebdf2a8551fa08956079179",
        "_score" : null,
        "_source" : {
          "parentId" : "5ebdf263551fd81d52158964",
          "level" : 3,
          "path" : [
            {
              "level" : 1,
              "id" : "5ebdf241551f9ae2328fa452"
            },
            {
              "level" : 2,
              "id" : "5ebdf263551fd81d52158964"
            }
          ],
          "id" : "5ebdf2a8551fa08956079179",
          "name" : "夏夏夏"
        },
        "sort" : [
          "夏夏夏",
          "3"
        ]
      }

4. 代碼結構設計

在這裏插入圖片描述

4.1 普通結構接口-EsRepository

@NoRepositoryBean
public interface EsRepository<T> {

    T save(T t);

    T saveWithoutRefresh(T t);

    Iterable<T> saveAll(Iterable<T> entities);

    boolean deleteById(String id);

    void deleteByQuery(QueryBuilder query);

    boolean updateById(String id, Map<String, Object> doc);

    void updateAllById(Iterable<String> ids, Map<String, Object> doc);

    void updateByQuery(QueryBuilder query, Script script);

    boolean existsById(String id);

    Optional<T> findById(String id);

    Optional<T> findById(String id, SourceFilter sourceFilter);

    List<T> findAllById(Iterable<String> ids);

    List<T> findAllById(Iterable<String> ids, SourceFilter sourceFilter);

    List<T> findByQuery(QueryBuilder query);

    List<T> findByQuery(QueryBuilder query, SourceFilter sourceFilter);

    List<T> findByQuery(SearchQuery searchQuery);

    PageInfo<T> pageQuery(SearchQuery searchQuery);

    Long count(QueryBuilder query);

    Map<String, Long> countGroupBy(String field, QueryBuilder query, Integer resultSize);

    Class<T> getEntityClass();
}

4.2 樹形結構接口-EsTreeRepository

@NoRepositoryBean
public interface EsTreeRepository<T extends TreeNode> extends EsRepository<T> {

    T saveNode(T t);

    Iterable<T> saveAllNodeOfParent(String parentId, Iterable<T> entities);

    boolean deleteNodeById(String id);

    List<T> findChildrenByParentId(String parentId, boolean onlyNextLevel, SearchQuery searchQuery);

    List<T> findForefathersById(String id, SourceFilter sourceFilter);

    Long countByParentId(String parentId, boolean onlyNextLevel, QueryBuilder query);

    Map<String, Long> countByParentId(List<String> parentIds, boolean onlyNextLevel, QueryBuilder query);
}

4.3 普通結構抽象類-AbstractEsRepository

public abstract class AbstractEsRepository<T> implements EsRepository<T> {
....
省略實現方法
....
}

4.4 樹形結構抽象類-AbstractEsTreeRepository

@Component
public class AbstractEsTreeRepository<T extends TreeNode> extends AbstractEsRepository<T> implements EsTreeRepository<T> {
....
省略實現方法
....
}

4.5 樹形結構基類-TreeNode

@Data
public class TreeNode {

    @EsNodeParentId
    private String parentId;

    @EsNodeLevel
    private int level;

    @EsPath
    private List<ParentNode> path;
}

4.6 測試對象實體類-TestTreeEntity

注意下面的@ToString(callSuper=true),因爲我用了@Data註解,在反序列化時發現得到的對象沒有父類TreeNode的屬性,經過排查發現是lombok默認重寫了toString()方法,所以這樣要加@ToString(callSuper=true),或者你就不要用lombok。

@Data
@NoArgsConstructor
@AllArgsConstructor
@ToString(callSuper=true)
@EsDocument(indexName = "pigg_tree", type = "_doc")
public class TestTreeEntity extends TreeNode {

    @EsId
    private String id;

    private String name;
}

5. 實現方法

因爲代碼實在太多了,不可能全部貼博客了,列舉幾個感覺比較重要的實現方法。

5.1 saveAll

public Iterable<T> saveAll(Iterable<T> entities) {
    BulkRequest bulkRequest = new BulkRequest();
    Metadata metadataOfClass = null;
    Iterator iterator = entities.iterator();
    T first = (T) iterator.next();
    metadataOfClass = MetadataUtils.getMetadata(first.getClass());
    Metadata finalMetadataOfClass = metadataOfClass;
    entities.forEach(t -> {
        IndexRequest indexRequest = prepareIndex(t, finalMetadataOfClass);
        if (indexRequest != null) {
            bulkRequest.add(indexRequest);
        }
    });
    try {
        checkForBulkUpdateFailure(client.bulk(bulkRequest, RequestOptions.DEFAULT));
    } catch (IOException e) {
        throw new ElasticsearchException("Error while bulk for request: " + bulkRequest.toString(), e);
    }
    return entities;
}

5.2 deleteById

public boolean deleteById(String id) {
    if (StringUtils.isEmpty(id)) {
        throw new ElasticsearchException("ID cannot be empty");
    }
    Metadata metadata = MetadataUtils.getMetadata(getEntityClass());
    DeleteRequest request = new DeleteRequest(
            metadata.getIndexName(),
            metadata.getTypeName(),
            id);
    request.setRefreshPolicy(WriteRequest.RefreshPolicy.NONE);
    try {
        DeleteResponse deleteResponse = client.delete(request, RequestOptions.DEFAULT);
        if (deleteResponse.getResult() == DocWriteResponse.Result.DELETED) {
            return true;
        }
    } catch (IOException e) {
        throw new ElasticsearchException("Error while deleting item request: " + request.toString(), e);
    }
    return false;
}

5.3 deleteByQuery

public void deleteByQuery(QueryBuilder query) {
    if (query == null) {
        throw new ElasticsearchException("query cannot be empty");
    }
    Metadata metadata = MetadataUtils.getMetadata(getEntityClass());
    DeleteByQueryRequest deleteByQueryRequest = new DeleteByQueryRequest(metadata.getIndexName())
            .setDocTypes(metadata.getTypeName())
            .setQuery(query)
            .setAbortOnVersionConflict(false)
            .setRefresh(true);
    deleteByQueryRequest.setConflicts("proceed");
    try {
        client.deleteByQuery(deleteByQueryRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        throw new ElasticsearchException("Error for delete request: " + deleteByQueryRequest.toString(), e);
    }
}

5.4 updateAllById

public void updateAllById(Iterable<String> ids, Map<String, Object> doc){
    Assert.notNull(ids, "ids can't be null.");
    List<String> idList = stringIdsRepresentation(ids);
    Metadata metadata = MetadataUtils.getMetadata(getEntityClass());
    BulkRequest bulkRequest = new BulkRequest();
    idList.forEach(id -> {
        UpdateRequest request = new UpdateRequest(metadata.getIndexName(), metadata.getTypeName(), id);
        request.doc(doc);
        bulkRequest.add(request);
    });
    try {
        checkForBulkUpdateFailure(client.bulk(bulkRequest, RequestOptions.DEFAULT));
    } catch (IOException e) {
        throw new ElasticsearchException("Error while bulk for request: " + bulkRequest.toString(), e);
    }
}

5.5 existsById

public boolean existsById(String id) {
    String thisId = stringIdRepresentation(id);
    if (StringUtils.isEmpty(thisId)) {
        throw new ElasticsearchException("ID cannot be empty");
    }
    Metadata metadata = MetadataUtils.getMetadata(getEntityClass());
    GetRequest getRequest = new GetRequest(
            metadata.getIndexName(),
            metadata.getTypeName(),
            thisId);
    getRequest.fetchSourceContext(new FetchSourceContext(false));
    getRequest.storedFields("_none_");
    try {
        return client.exists(getRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        throw new ElasticsearchException("Error for existsById request: " + getRequest.toString(), e);
    }
}

5.6 根據id集合獲取數據List

    public List<T> findAllById(Iterable<String> ids, SourceFilter sourceFilter) {
        Assert.notNull(ids, "ids can't be null.");
        List<String> idList = stringIdsRepresentation(ids);

        Metadata metadata = MetadataUtils.getMetadata(getEntityClass());

        if (metadata != null) {

            MultiGetRequest request = new MultiGetRequest();
            for (String id : idList) {

                MultiGetRequest.Item item = new MultiGetRequest.Item(metadata.getIndexName(), metadata.getTypeName(), id);
                if (sourceFilter != null && !(sourceFilter.getIncludes() == null && sourceFilter.getExcludes() == null)) {
                    item.fetchSourceContext(new FetchSourceContext(true, sourceFilter.getIncludes(), sourceFilter.getExcludes()));
                }
                request.add(item);
            }

            try {
                MultiGetResponse response = client.mget(request, RequestOptions.DEFAULT);
                return EsResponseUtils.multiGetResponse2Obj(response, this.entityClass);
            } catch (IOException e) {
                throw new ElasticsearchException("Error for findAllById request: " + request.toString(), e);
            }
        }
        return null;
    }

5.7 countGroupBy

public Map<String, Long> countGroupBy(String field, QueryBuilder query, Integer resultSize){
    if (StringUtils.isEmpty(field)) {
        throw new ElasticsearchException("field cannot be empty");
    }
    if (resultSize == null || resultSize <= 0){
        resultSize = 1000;
    }
    Map<String, Long> groupMap = new LinkedHashMap<>();
    Metadata metadata = MetadataUtils.getMetadata(getEntityClass());
    AggregationBuilder agg = AggregationBuilders.terms("agg")
            .field(field)
            .size(resultSize)
            .order(BucketOrder.key(true))
            .order(BucketOrder.count(false));
    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    if (null != query) {
        boolQueryBuilder.filter(query);
    }
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(boolQueryBuilder);
    searchSourceBuilder.size(0);
    searchSourceBuilder.aggregation(agg);
    SearchRequest request = new SearchRequest(metadata.getIndexName());
    request.types(metadata.getTypeName());
    request.source(searchSourceBuilder);
    try {
        SearchResponse searchResponse = client.search(request, RequestOptions.DEFAULT);
        Terms groups = searchResponse.getAggregations().get("agg");
        for (Terms.Bucket entry : groups.getBuckets()) {
            groupMap.put(entry.getKey().toString(), entry.getDocCount());
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return groupMap;
}

5.8 樹形結構的countByParentId

這個方法是統計一組節點下其各自兒子或者孫子(通過onlyNextLevel區分)的共節點個數。

public Map<String, Long> countByParentId(List<String> parentIds, boolean onlyNextLevel, QueryBuilder query) {
        if (CollectionUtils.isEmpty(parentIds)) {
            throw new ElasticsearchException("parentIds cannot be empty");
        }

        Map<String, Long> result = new HashMap<>();
        Metadata metadata = MetadataUtils.getMetadata(getEntityClass());

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if (onlyNextLevel){
            boolQueryBuilder.filter(QueryBuilders.termsQuery("parentId", parentIds));
            return countGroupBy("parentId", boolQueryBuilder, parentIds.size());
        }else {
            if (query != null){
                boolQueryBuilder.filter(query);
            }

            BoolQueryBuilder boolQueryBuilderForNested = QueryBuilders.boolQuery();
            boolQueryBuilderForNested.filter(QueryBuilders.termsQuery("path.id", parentIds));
            boolQueryBuilder.filter(QueryBuilders.nestedQuery("path", boolQueryBuilderForNested, ScoreMode.None));

            NestedAggregationBuilder nestedAggregationBuilder = AggregationBuilders.nested("group_by_path", "path");
            nestedAggregationBuilder.subAggregation(AggregationBuilders.terms("terms_by_path").field("path.id").size(parentIds.size()));

            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            searchSourceBuilder.query(boolQueryBuilder);
            searchSourceBuilder.size(0);
            searchSourceBuilder.aggregation(nestedAggregationBuilder);

            System.out.println(boolQueryBuilder.toString());

            System.out.println(nestedAggregationBuilder.toString());

            SearchRequest request = new SearchRequest(metadata.getIndexName());
            request.types(metadata.getTypeName());
            request.source(searchSourceBuilder);

            try {
                SearchResponse searchResponse = client.search(request, RequestOptions.DEFAULT);
                Aggregations aggregations = searchResponse.getAggregations();
                if (aggregations != null) {
                    Map<String, Aggregation> aggregationMap = aggregations.asMap();
                    if (aggregationMap != null && !aggregationMap.isEmpty()) {
                        Aggregation groupByAncestorId = aggregationMap.get("group_by_path");
                        if (groupByAncestorId != null) {
                            ParsedNested parsedNested = (ParsedNested) groupByAncestorId;
                            //獲得所有的桶
                            Aggregations subAggregations = parsedNested.getAggregations();
                            Map<String, Aggregation> subAggregationsMap = subAggregations.getAsMap();
                            Aggregation termsByAncestorId = subAggregationsMap.get("terms_by_path");

                            ParsedStringTerms parsedStringTerms = (ParsedStringTerms) termsByAncestorId;
                            //獲得所有的桶
                            List<? extends Terms.Bucket> buckets = parsedStringTerms.getBuckets();
                            if (!CollectionUtils.isEmpty(buckets)) {
                                buckets.stream().forEach(bucket ->
                                {
                                    result.put(bucket.getKeyAsString(), bucket.getDocCount());
                                });
                            }
                        }
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return result;
}

總結

  • 學一門技術,需要先廣度,後深度,不要要求自己一下子達到什麼高度,先完成簡單的。
  • 比如這次寫這個ORM,暫時不考慮ES的index和mapping設置,version字段,多index操作等,這些可以後期慢慢完善。
  • 要區分反射時getDeclaredFields()和getFields()方法,如果要獲取父類的屬性,可以用Hutool工具的ReflectUtil.getFieldsDirectly(clazz, true)。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章