Apache Kylin發佈4.0.0-beta穩定版本

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Kylin 社區日前正式發佈了4.0.0-beta 版本。Apache Kylin 是一個開源的分佈式分析引擎,提供 Hadoop\/Spark 之上的 SQL 查詢接口及多維分析 (OLAP) 能力,支持對超大規模數據進行亞秒級查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Kylin 4.0.0-beta 是繼 Kylin 4.0.0-alpha 之後的一個重要版本。當前的 4.0.0-beta 是 Kylin 4.x 發佈的第二個測試版本,修復了 4.0.0-alpha 中的若干 bug,並且補充了一些對 Kylin 3.x 原有功能的支持,包括 System Cube,Hadoop 3 的支持,部分高級函數,Cube Planner Phase 1 等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 4 使用 Parquet 這種真正的列式存儲來代替 HBase 存儲,從而提升文件掃描性能;同時,Kylin 4 重新實現了基於 Spark 的構建引擎和查詢引擎,使得計算和存儲的分離變爲可能,更加適應雲原生的技術趨勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在互聯網企業中的典型使用場景和相關性能測試結果爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"場景一(簡單查詢)"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/80\/80fdc1d650a3ada0f5a4a141bf939672.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上結果展示,可見在沒有做查詢優化的情況下,"},{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 4.0 簡單查詢"},{"type":"text","text":"的查詢響應時間表現不及 Kylin 3.0。通過初步溝通和測試,經過 ShardBy Column 優化手段後,簡單查詢的查詢響應時間可以降低 1~2s,與 Kylin 3.0 的性能變得更加接近。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 4.0 簡單查詢的查詢響應時間表現不及 Kylin 3.0 的原因,經分析主要如下: "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase 對 rowkey 的索引速度快,Region 有緩存機制,容易實現亞秒級響應;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次測試使用的 Spark 3 開啓了 AE,也會對小查詢有一定的負面影響;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SparkSQL 執行需要準備時間進行節點分配和閉包分發,需要訪問 Hadoop API 進行文件訪問,這些額外的時間都是可以通過後續的優化來改善的,例如添加 Alluxio\/JuiceFS 提供緩存。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"場景二(返回結果行數多的查詢)"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ad\/ad14a9f2ecb37bc3eae5763d60c4ef68.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些 SQL 的模式主要是如下兩類:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模式一:SELECT DISTINCT product FROM table WHERE version IN ( ...so many versions... ) "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模式二:SELECT package, SUM(occur), count(distinct device) FROM table WHERE version IN ( ...so many versions... ) group by package  "}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於 IN 子句中包含過多的可選值,加上 package 是高基維度,導致 Kylin 在掃描和二次聚合的壓力都非常大,Kylin 3.0 不適合進行這樣的“導出式”查詢。但 Kylin 4.0 利用 Spark 計算引擎的能力,能夠在十分可觀的時間內得出需要的結果。這裏舉個例子:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/31\/316ff786114d2cc4d7b944d6eb06e102.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"場景三(時間跨度大的查詢)"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/eb\/ebc6091fbce03e9eb9c646bb72989fe9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這 4 個查詢的 SQL 除了在時間範圍上不同其他完全相同。從數據不難看出,時間跨度增長一倍,Kylin 3.0 需要掃描的數據量和查詢響應時間也相應地增長近一倍。而 Kylin 4.0 使用 Spark 分佈式計算,更高的並行度帶來的優勢便體現了出來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前 Kylin 4.0.0-beta 已經成爲一個相對穩定的版本,在多個早期用戶的測試驗證中都驗證了構建和查詢功能基本達到相對完備的程度,但是目前仍有不少在性能和功能上的提升空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次發佈共添加了 25 個新功能以及改進,修復了 14 個問題,詳情請訪問:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/kylin.apache.org\/docs\/release_notes.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次發佈重要更新:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4857] - 爲 Kylin 4 重構 System Cube"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4842] - 爲 Kylin 4 支持 grouping sets 函數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4829] - 爲查詢引擎支持線程級別的 Spark 參數配置"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4813] - 爲構建引擎重新開發日誌系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4858] - 支持在 CDH 6.X 上部署 Kylin 4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4818] - Kylin 4 支持 Cuboid 行數統計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[KYLIN-4817] - 爲 Kylin 4 重構 Cube 遷移工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下載 Apache Kylin 4.0.0-beta 源代碼及二進制安裝包,請訪問下載頁面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/kylin.apache.org\/cn\/download\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 官方 wiki:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/cwiki.apache.org\/confluence\/display\/KYLIN\/User+Manual+4.X"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章