HTAP 爲實時數據服務插上翅膀

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"成爲主流趨勢的 HTAP"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由 Gartner 提出的 HTAP 數據庫(混合事務 \/ 分析處理,Hybrid Transactional\/Analytical Processing)成爲希望。基於創新的計算存儲框架,HTAP 數據庫能夠在一份數據上同時支撐 OLTP 和 OLAP 場景,避免在傳統架構中,在線與離線數據庫之間大量的數據交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"HTAP 數據庫基於分佈式架構,支持彈性擴容,可按需擴展吞吐或存儲,輕鬆應對高併發、海量數據場景。"},{"type":"text","text":"當下,由 HTAP 數據庫提供的實時分析能力已經成爲企業的核心競爭力之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4ac7b92fef6a052de2f625ec09708d8e.webp","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業務挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智慧芽(PatSnap)是一家科技創新情報 SaaS(軟件即服務)服務商,聚焦科技創新情報和知識產權信息化服務兩大板塊,爲全球 50 多個國家超 10000 家科技公司、高校、科研與金融機構提供大數據情報服務。在數據源方面,智慧芽已存儲了 1.5 億多個全球專利數據、1.7 億多個化學結構數據,以及千萬級財務新聞、科技文獻、市場報告、投資信息等海量數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務場景的不斷拓展和用戶規模的迅速增長,業務運營過程中,智慧芽深度依賴對實時數據的分析和結果呈現,需要進行用戶行爲分析,提供實時大盤和特定場景的運營數據,對流量和服務的分析也不可或缺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智慧芽原先採用 Segment 與 Redshift 的數據分析架構,僅構建出了 ODS 層,數據寫入的規則和 schema 不受控制,且需要針對 ODS 編寫複雜的 ETL,按照業務需求進行各類指標的計算來完成上層業務的數據請求。Redshift 中落庫數據量大,計算慢(T+1 時效),影響對外服務的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"TiDB + Flink 實時數倉方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過多輪選型和對比測試,智慧芽選用 TiDB + Flink 實時數倉方案拓展數據分析架構體系的能力版圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"TiDB HTAP 是一個可擴展的行存和列存整合的架構,在存儲上是可以使用分離的不同節點,可以確保 OLTP 和 OLAP 兩邊互相之間沒有干擾,實時性、一致性、可延展性都能得到很好的保證。"},{"type":"text","text":"Flink 是一個低延遲、高吞吐、流批統一的大數據計算引擎,被普遍用於高實時性場景下的實時計算,具有支持 exactly-once 等重要特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合了 TiDB HTAP 與 Flink 兩者的特性,TiDB + Flink 的方案優勢顯而易見:首先是"},{"type":"text","marks":[{"type":"strong"}],"text":"速度有保障"},{"type":"text","text":",兩者都可以通過水平擴展節點來增加算力;其次,"},{"type":"text","marks":[{"type":"strong"}],"text":"TiDB 深度兼容 MySQL 協議"},{"type":"text","text":",Flink 提供 Flink SQL 和強大的連接器來編寫和提交任務,學習和配置成本相對較低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2d\/2d07f508edf17c0155af1e3872536445.webp","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"智慧芽實時數據分析平臺架構示意圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"替換爲基於 TiDB + Kinesis + Flink 構建的實時數倉架構後,不再需要構建 ODS 層。Flink 作爲前置計算單元,直接從業務出發構建出 Flink Job ETL,完全控制了落庫規則並自定義schema,即僅把業務關注的指標進行清洗並寫入 TiDB 來進行後續的分析查詢,寫入數據量大幅減少。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按用戶\/租戶、地區、業務動作等關注的指標,結合分鐘、小時、天等不同粒度的時間窗口等,在 TiDB 上構建出 DWD\/DWS\/ADS 層,直接服務業務上的統計、清單等需求,上層應用可直接使用構建好的數據,且獲得了秒級的實時能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這套實時數據分析平臺架構實現了"},{"type":"text","marks":[{"type":"strong"}],"text":"真正意義的 Real Time Data as a Service"},{"type":"text","text":",目前應用於慧芽用戶行爲分析和追蹤、租戶行爲分析等實時分析場景,併爲業務運營大盤提供實時數據支撐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"應用價值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在使用了新架構後,入庫數據量、入庫規則和計算複雜度都顯著下降,數據在 Flink Job 中已經按照業務需求處理完成並寫入 TiDB,無需基於 Redshift 的 全量 ODS 層進行 T+1 ETL。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 TiDB 構建的實時數倉,通過合理的數據分層,架構上獲得了極大的精簡,開發維護也變得更加簡單,在數據查詢、更新、寫入性能上都獲得大幅度提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在滿足不同的 adhoc 分析需求時,不再需要等待類似 Redshift 預編譯的過程,易於開發且擴容方便。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頭圖:Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:PingCAP"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:https:\/\/mp.weixin.qq.com\/s\/p_hhX_UG2AfOvStaP8Ht0w"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:TiDB X 智慧芽 | HTAP 爲實時數據服務插上翅膀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源:PingCAP - 微信公衆號 [ID:pingcap2015]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉載:著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章