Kylin 在貝殼的性能挑戰和 HBase 優化實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 在貝殼的使用情況介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/eb\/0b\/ebb1c80c84c1c260dc40eef84753b90b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin從2017年開始作爲貝殼公司級OLAP引擎對外提供服務, "},{"type":"text","marks":[{"type":"strong"}],"text":"目前有100多臺Kylin實例;有800多個Cube;有300多T的單副本存儲;在貝殼 Kylin 有兩套HBase集羣,30多個節點,Kylin每天的查詢量最高2000+萬"},{"type":"text","text":" 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們負責 Kylin同事張如松在2018年Kylin Meetup上"},{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653078369&idx=1&sn=255f18ed718912fda53cabdd50afdd7d&chksm=80a4bd90b7d33486036c8dbe3a84df5cb638eb532057940cca7358a3f6f8893696b6d0680a6b&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"分享過Kylin在貝殼的實踐"}]},{"type":"text","text":",當時每天最高請求量是100多萬,兩年的時間裏請求量增加了19倍;我們對用戶的查詢響應時間承諾是3秒內的查詢佔比要達到99.7%,我們最高是達到了99.8%。在每天2000+W查詢量的情況下,Kylin遇到很多的挑戰,接下來我將爲大家介紹一下我們遇到的一些問題,希望能給社區的朋友提供一些參考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin HBase優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"表\/Region不可訪問"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b9\/88\/b9e5437203c8bd7f2ccfaa8718315088.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"凌晨構建Cube期間,會出現重要表的某個region不可訪問導致構建失敗的情況,右上角的圖是HBase的meta表不可訪問的日誌;白天查詢時也有部分查詢因爲數據表某個Region不可訪問導致查詢超時的情況,右下角的圖是查詢數據表Region超時的日誌;另外一個現象是老的Kylin集羣Region數量達到16W+,平均每臺機器上1W+個Region,這導致Kylin HBase集羣建表和刪表都非常慢,凌晨構建會出現建表卡住的現象,同時清理程序刪除一張表需要三四分鐘的時間,面對這樣的情況,我們做了一些改進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c9\/b6\/c9d18f074c626038d078c4f189eyy1b6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"刪除無用表減少Region。"},{"type":"text","text":" 通過剛纔的介紹HBase集羣平均每臺機器上1W+個Region,這對於HBase來說是不太合理的,另外由於刪除一張表需要三四分鐘的時間,清理程序也執行的異常緩慢,最後我們不得不使用了一些非常規手段刪除了10W+個Region。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"縮短清理週期,"},{"type":"text","text":" 從之前的一週清理一次HBase表到每天清理一次,除此之外Kylin會每週合併一次Cube來減少HBase表數量從而減少Region數量,最終16W+的Region刪到了不到6萬,至此我們解決了一部分問題,還會存在構建時重點表的Region不可訪問的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"將HBase從1.2.6升到1.4.9,"},{"type":"text","text":" 主要是想要利用RSGroup的能力來做重點表和數據表的計算隔離;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"關閉HBase自動Balance的功能,"},{"type":"text","text":" 僅在夜間業務低峯期開啓幾個小時;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用HBase自帶的Canary定期的檢測Region的可能性,"},{"type":"text","text":" 如果發現某些Region不可用馬上發送告警"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用RSGroup單獨隔離重點表來屏蔽了計算帶來干擾,"},{"type":"text","text":" 這些重點表包括HBase Meta表、Acl表、Namespace表、Kylin_metadata表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過了這一系列的改進之後表\/Region不可訪問的問題基本上解決了,現在基本上沒有再出現Region不可訪問的情況。解決這個問題我們花費了很長時間,經歷了升級重啓和刪了大量的表後,我們遇到了另外一個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"RS數據本地性提升"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f6\/4e\/f60ec60a57a6b809aec55539ea66d44e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kylin HBase集羣的RegionServer數據本地性非常低,只有20%不能很好的利用HDFS短路讀,這樣對查詢響應時間產生了一定影響"},{"type":"text","text":" ,我們三秒內的查詢佔比出現了下降。瞭解HBase的朋友都知道如果RS的數據本地性較低,有一種解決方案就是做Compact把數據拉到RegionServer對應的Datanode上,考慮到大規模的做Compact會對查詢造成很大影響,我們沒有這麼做,跟Kylin的同學溝通後發現絕大多數的Cube每天會使用最新構建的表,查舊錶的可能系不是特別大,所以提升每天新建表的數據本地性就可以了, 具體我們是這樣做的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0b\/c7\/0b155ab3a4ac54a4e9989ee76018bdc7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們發現Kylin用到的是HFileOutputFormat3跟HBase的HFileOutputFormat2是有一些差別的,我們在HFileOutputFormat3裏面加入了HBASE—12596的特性,這個特性主要是生成HFile的時候會寫一份數據的副本到Region所在的RegionServer對應的Datanode上。下面是一些代碼細節,程序會先取到這個Region所在的機器,然後再獲取Writer時,把這臺節點的信息傳遞過去,最後寫數據的時候會寫一個副本到這個Region對應的Datanode上, "},{"type":"text","marks":[{"type":"strong"}],"text":"這樣逐漸我們的數據穩定性就提上來了,現在看了一下基本上在80%多左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"RegionServer IO瓶頸"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們發現在構建早高峯時,HBase響應時間的P99會隨之升高的,通過監控發現是由於 RegionServer機器的IO Wait偏高導致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一種場景是用戶構建時間範圍選擇過大,導致網卡被打滿,之前有個用戶構建了一年的數據,還有構建三四個月數據,這兩種情況都會造成RegionServer機器IO出現瓶頸導致Kylin查詢超時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c3\/6a\/c30b8214d926e83d42401fa6633f186a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖是Cube數據構建流程,首先HBase集羣和公司大的Hadoop集羣是獨立的兩套HDFS集羣,每天構建是從大集羣的HDFS去讀取Hive的數據,構建任務直接輸出HFile到HBase的HDFS集羣,最後執行Bulkload操作。由於HBase HDFS集羣機器較少,構建任務寫數據過快導致DataNode\/RegionServer機器IO Wait升高,怎麼解決這個問題呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5e\/08\/5e9f5dd067d7350b7276ae938fe47308.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們想用HBase比較常用的方式就是DistCp來解決這個問題,左下角這張圖是我們的改進方案,就是我們設置構建任務的輸出路徑到Hadoop的大集羣,而不是到HBase的HDFSB及羣,再通過DistCp限流的拷貝HFile到HBase的HDFS集羣,最後做Bulkload操作。之前提到我們有800多個Cube,並不是所有的Cube都需要走這套流程, "},{"type":"text","marks":[{"type":"strong"}],"text":"因爲限流拷貝的話肯定會影響數據的產出時間,我們設計了針對Project或者是Cube設置開啓這個功能,"},{"type":"text","text":" 我們通常會對數據量比較大的Cube開啓DistCp限流拷貝,其他Cube還是使用之前的數據流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中間這個圖是一個構建任務的截圖,第一步是生成HFile,第二步是DistCp,最後再Bulkload,這個功能我們新增了一些配置項,比如說第一個是否開啓DictCp,第二個是每個Map帶寬是多少,再有就是最大有多少Map。通過這個功能我們基本上解決了構建高峯IOWait會變高的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"慢查詢治理–超時定位鏈路優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7f\/5d\/7f5bda792c6f08db277bc5326c428e5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"慢查詢治理遇到的第一個問題就是超時定位鏈路特別長。我們收到Kylin報警時首先會想知道:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是哪個Cube超時了?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cube對應的HBase表是哪個?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是Region不可用還是查詢方式變了?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前提到有一段時間經常出現Region不可用的情況,一旦出現超時我們查詢鏈路是什麼樣的呢? 可能我們先去看HBase日誌裏看有沒有Deadline has passed的警告日誌,有這種報警的話我們會拿到它的QueryID,然後去ES或者是Mysql裏面去查詢這個QureyID對應的Cube信息和SQL,知道這些信息之後,還需要去到超時的Kylin節點上去查詢日誌,從日誌裏面才能找到是查詢哪個HBase表的哪個Region超時,然後再去判斷是不是Region不可用了,或者是查詢方式改變。 "},{"type":"text","marks":[{"type":"strong"}],"text":"這個鏈路非常長,每次都需要HBase和Kylin的同學一塊兒來查。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4f\/2d\/4f1d23dc4c7e4c9d0065ee20661cbd2d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"針對這個痛點給我們做了如下改進:我們直接把Cube信息和Region的信息打在HBase的日誌裏。"},{"type":"text","text":" 中間這個黑色的部分就是HBase的日誌,我們可以看到這個查詢已經終止了,Cube的名字是什麼,Region的名字是什麼,下面的白色部分是通過天眼系統配置的報警信息,這個報警是直接報到企業微信的,我們能馬上知道這個Deadline涉及的Cube和Region,能馬上做一個檢測是這個表不可用了還是查詢方式改變了,大大節省了定位問題的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6e\/34\/6e0dc3efbf76833a48e2c46byy6a1134.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個是爲了解決超時鏈路過長我們對Kylin做的一些代碼改動,首先我們在Protobuf文件中加了一個segmentName字段,然後在協處理器類中獲取了Region名字,在協處理器調用checkDeadLine方法檢查時傳入segmentName和regionName,最後日誌會打印出來segment名稱和Region的信息。 "},{"type":"text","marks":[{"type":"strong"}],"text":"這個功能已經反饋給社區了,見:"},{"type":"link","attrs":{"href":"https:\/\/issues.apache.org\/jira\/browse\/KYLIN-4788","title":"","type":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"https:\/\/issues.apache.org\/jira\/browse\/KYLIN-4788"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"慢查詢治理-隊列堆積定位"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/02\/y0\/02edd1f4901af485ebf1d11yya292yy0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有一天我們發現Kylin HBase RegionServer隊列堆積非常嚴重,RegionServer的P99的響應時間已經達到了10多分鐘的級別,大家看右上角是HBase關於隊列的監控情況,一些機器的堆積已將近3W。我們當時非常疑惑,因爲Kylin和HBase之間RPC的超時時間是10秒,在10秒之後Kylin和HBase的連接都已經斷開了,HBase到底處理什麼查詢,右下角是HBase RegionServer UI頁面的截圖,在這個截圖裏我們發現一些查詢其實已經執行了快半個小時了,這半個小時是在執行什麼呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/41\/66\/41ab16c1c803f38b4b15fccf687e7f66.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們當時的解決方案是去任務堆積的有隊列推積的RegionServer上去看日誌,通過查詢開始時間結束時間做差值,找出查詢時間最長的Top10的查詢,通過QureyID匹配出Cube和具體的SQL,最終我們發現一般這種查詢時間特別長都是因爲查詢方式的變化與原來Cube設置的Rowkey不相符導致了全表掃描。最終的方案其實查出來之後Kylin的同學會去調整Cube的Rowkey設置,然後重新構建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"這種離線的定位的方式其實不是特別好,一開始我們想基於日誌做實時報警,這樣能幫助我們更快的發現和定位問題,但是後來想想這也是比較被動的一種方式,這只是發現問題,不能徹底解決這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們後來想的一個方案是SQL作執行之前可以爲SQL打分,評分過低的就拒絕執行,這個功能還沒有實現。有這個想法是因爲當我們找到SQL信息後, Kylin的同學是可以看出來查詢是不是不合理,是不是跟Rowkey設置不符,我們想以後做這樣一個功能,把人爲判斷的經驗程序化,在SQL沒有執行之前就把潛在的風險化解掉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"慢查詢治理 – 主動防禦"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/01\/56\/01ec3349e16dec165a1f7a3a5094b456.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"慢查詢治理還有一個舉措是Kylin的主動防禦。我們發現有大量的耗時較長的查詢會佔據請求隊列,影響其他查詢的響應時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方案是通過Kafka收集Kylin的日誌,經過天眼系統實時清洗後寫入Druid,通過Druid做統計分析,如果某個業務方\/Cube在一定時間內超過3秒的查詢到達一定的閥值,主動防禦系統會把這個業務方\/Cube的查詢超時時間設置爲1s,讓較慢的查詢儘快超時,避免對正常查詢的干擾。右邊就是我們整個流程的一個架構圖,主動防禦對慢查詢治理有一定的作用,但全表掃描的情況還是沒有辦法完全避免。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"重點指標查詢性能保障"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/68\/42\/681d0c9bed73a55b27ea2a797f484042.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外一個舉措是對重點指標的查詢性能保障。早期HBase集羣只有HDD一種存儲介質,重點指標和普通指標都存儲在HDD上,非常容易受到其他查詢和HDD性能的影響,重點指標響應時間無法保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的解決方案是利用了HDFS的異構存儲,給一部分DataNode插上SSD,將重點Cube的數據存儲在SSD上,提升吞吐的同時與普通指標數據做存儲隔離,這樣就既避免了受到其他查詢的影響,也可以通過SSD的性能來提升吞吐。引入SSD只是做了存儲的隔離,還可以通過RSGroup做計算隔離,但由於重點指標的請求量佔到了集羣總請求量的90%以上,單獨隔離出幾臺機器是不足以支撐這麼大請求量的,所以最終我們並沒有這麼做。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後是我們得出的一些經驗, "},{"type":"text","marks":[{"type":"strong"}],"text":"SSD對十萬以上掃描量查詢性能提升40%左右,對百萬以上掃描量性能提升20%左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/cb\/e2\/cb3c31d8322ac732f9895e9ae1a713e2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是我們用SSD做的一些改動,數據存儲在SSD是可針對Cube設置的。我們可以指定哪些Cube存在SSD上,構建任務建表時會讀取Cube的配置,按照Cube配置來設置HBase表的屬性和該表的HDFS路徑存儲策略。在DistCp拷貝之前也要先讀取Cube的配置,如果Cube的配置是ALL_SSD,程序需要設置DistCp的目的路徑存儲策略爲ALL_SSD,設置完成後再進行數據拷貝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"這樣做的目的是爲了避免Bulkload後數據還需要從HDD移動到SSD,移動數據會帶來什麼影響呢?"},{"type":"text","text":" 我們發現如果不先設置DistCp目的路徑存儲策略的話,數據會被先寫到HDD上,Bulkload後由於表的HDFS存儲路徑存儲策略是ALL_SSD,Hadoop的Mover程序會把數據從HDD移動到SSD,當一個數據塊的三個副本都移動到SSD機器上後,RegionServer不能從其緩存該數據塊的三臺DataNode上讀取到數據,這時RegionServer會隨機等待幾秒鐘後去向NameNode獲取該數據塊最新的DataNode信息,這會導致查詢響應時間變長,所以需要在DistCp拷貝數據之前先設置目的路徑的存儲策略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"JVM GC瓶頸"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)現象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/07\/d2\/07242fb1bb9647c0e7ea494343f707d2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們遇到的下一個問題就是RegionServer的JVM GC瓶頸。在查詢高峯期Kylin HBase JVM Pause報警特別頻繁,從這張圖裏面可以看到有一天已經超過1200個。Kylin對用戶的承諾是三秒內查詢佔比在99.7%,當時已經達到了99.8%,於是我們就想還需要優化哪一塊能讓3秒內查詢佔比達到99.9%,這個JVM Pause明顯成爲我們需要改進的一個點,大家做JAVA基本都知道JVM怎麼去優化呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)解決方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/af\/2a\/af322fb83b6220d6b7fc4ddd91ba102a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"首先可能會想到調整參數,其次就是換一種GC算法,我們採用了後者。"},{"type":"text","text":" 之前我們用的是JDK1.8,GC算法是G1,後來我們瞭解到JDK11推出了一個新的算法叫ZGC。最終,我們把JDK從1.8升級到JDK13,採用ZGC替代了原有的G1。右上角的圖是ZGC上線後,這套集羣RegionServer 的JVM Pause的次數幾乎爲0,右下角的GC時間也是相比之前降低特別多。ZGC有一個設計目標是Max JVM Pause的時間在幾毫秒,這個效果當時看着是比較明顯的,左邊的圖是天眼系統的報警的趨勢圖,ZGC上線後JVM Pause報警數量明顯降低。關於ZGC我本月會發一篇文章介紹ZGC算法和我們做了哪些改動來適配JDK13,這裏就不詳細介紹了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"馮亮,貝殼找房高級研發工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號apachekylin(ID:ApacheKylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653081715&idx=1&sn=38e7a698feaa8889a37eb65615a0d69b&chksm=80a4ae82b7d3279489ba780ac2ce63f04938a7c95840a45d034648871333e03d734476f93ef3&token=1340822333&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"Kylin 在貝殼的性能挑戰和 HBase 優化實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章