Logi-KafkaManager開源之路:一站式Kafka集羣指標監控與運維管控平臺

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"導讀","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從2019年4月份計劃開源到2021月1月14號完成開源,歷時22個月終於修成正果,一路走來實屬不易,沒有前端、設計、產品,我們找實習生、合作方、外部資源支持,滴滴Kafka服務團隊人員也幾經調整,內部迭代了3個大版本,我們最終還是克服重重困難做到了!一經開源獲得了社區用戶廣泛的認可,截止當前Star達到1140,釘釘用戶突破550人,","attrs":{}},{"type":"link","attrs":{"href":"http://way.xiaojukeji.com/article-edit/%E6%BB%B4%E6%BB%B4%E5%BC%80%E6%BA%90Logi-KafkaManager%20%E4%B8%80%E7%AB%99%E5%BC%8FKafka%E7%9B%91%E6%8E%A7%E4%B8%8E%E7%AE%A1%E6%8E%A7%E5%B9%B3%E5%8F%B0","title":null},"content":[{"type":"text","text":"滴滴開源Logi-KafkaManager 一站式Kafka監控與管控平臺","attrs":{}}]},{"type":"text","text":"文章閱讀破1W+ UV。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Logi-KafkaManager簡介","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka作爲滴滴大數據消息隊列,每天承載萬億級消息的生產與消費,面對100GB/S+峯值採集流量,服務了公司內近千Kafka用戶,託管了數十Kafka集羣,數萬Kafka Topic,單集羣>300+Broker。歷經四年打磨沉澱,圍繞Logi-KafkaManager打造了滴滴Kafka平臺服務體系,內部滿意度達到90分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LogI-KafkaManager是面向Kafka用戶、Kafka運維人員打造的共享多租戶Kafka雲平臺,專注於Kafka資源申請、運維管控、監控告警、資源治理等核心場景。免費體驗地址:http://117.51.150.133:8080/kafka ,賬戶admin/admin,歡迎","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/didi/Logi-KafkaManager","title":null},"content":[{"type":"text","text":"Star","attrs":{}}]},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"爲什麼要開發Logi-KafkaManager","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"滴滴內部有幾十個kafka 集羣,450+ 的節點,每週500+UV用戶,需要完成 topic 創建、申請、指標查看等操作;每天運維人員還有大量topic管控、治理、集羣運維操作。因此我們需要構建一個Kafka的管控平臺來承載這些需求。我們調研了社區同類產品,在監控指標的完善程度、運維管控的能力、服務運營的理念都無法很好的滿足我們的需求。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Logi-KafkaManager功能亮點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、產品化設計之關注點分離","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業界開源的KafkaManager定位是一個面向運維人員的監控工具,在滴滴我們定位是全託管Kafka服務工具型平臺產品,針對的人羣區分爲Kafka用戶、Kafka運維。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka用戶:關注的是Topic相關的操作,Topic資源申請與擴容、Topic指標監控、Topic消費告警、Topic消息採樣、Topic消費重置等。  ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka運維:關注的是Kafka集羣相關的操作,集羣監控、集羣安裝、集羣升級、集羣Topic遷移、集羣容量規劃等。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、Kafka業務運行過程數據化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲消息中間件,Kafka最核心的能力就是消息的生產、消費,用戶高頻的問題都與此相關,作爲服務提供方,我們需要詳細的感知Topic的生產消費在服務端各個環節耗時,快速界定到底是服務端還是客戶端問題,如果是服務端問題,出在哪個環節,如下圖所示","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1e47e47046d0d0c3668b8dd61c4570a.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請求隊列排隊時間(RequestQueueTimeMs):","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Broker本地處理時間(LocalTime):","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請求等待遠程完成時間(RemoteTimeMs) :","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請求限流時間(ThrottleTimeMs)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"響應隊列排隊時間(ResponseQueueTimeMs)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"響應返回客戶端時間(ResponseSendTimeMs)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接收到請求到完成總時間(TotalTimeMs)","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過將這些服務端運行指標,以Topic粒度呈現,顯著的提升了服務用戶的效率,如下圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d85d9778ef1562e3a835b5d37d0991cd.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3、Kafka服務保障強管控","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka各語言客戶端版本衆多,官方也只有精力維護Java版的SDK,滴滴受限於服務人力,沒有進行客戶端版語言與版本管控,服務端拓展實現強管控客戶端元信息的能力。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拓展服務端能力,強感知客戶端的鏈接地址,協議類型,方便後續引擎對用戶行爲的感知與強管控。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8d/8dd3de9526752261d6c215be797483bd.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拓展實現Kafka服務端的安全認證能力,通過賬號機制記錄應用元信息,包括人員信息、業務信息、權限信息;通過Topic創建管控,記錄壓縮類型、Partiton、Quota等元信息,在服務端實現了對客戶生產、消費能的強管控。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/59/59bd94ec110ee01ef5b7cf0c175890ee.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4、最佳實踐之專家服務沉澱","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多年Kafka服務運營經驗,我們沉澱了大量的服務保障最佳實踐,結合應用場景,截止目前構建了以下幾項專家服務,後續我們會持續打磨與完善。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Topic集羣分佈不均遷移:不同broker上leader數目不均;同一個broker上不同磁盤leader分佈不均;同一個topic在broker上不同磁盤分佈不均。我們需要發現熱點,給用戶推薦遷移計劃","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f911176efeb3f2390082c05e87faafd.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Partitont不足擴容提示:根據單Partition承載流量,按照業務場景與底層硬件資源進行主動擴容提示,擴容標準:滴滴的實踐是TPS場景:單Partition 3MB/S;IOPS場景:單Partition 10條/S","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b4/b4647576c2a0bc2f948aaa803a4d2c64.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Topic無效資源下線:針對線上持續一個月Topic無流量,無生產消費鏈接的資源,通知用戶進行主動資源釋放","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/3547d39b5e675df76c47dacc0e9e4900.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Logi-KafkaManager架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺設計之初,我們就基於開源的理念進行平臺建設,遵循了依賴精簡、分層架構、能力API化、100%兼容歷史開源版本的原則,整體架構如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/34398b223e42ced06d25099bf4d490ff.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"資源層: kafka 引擎和 Logi-KafkaManager 除了 zookpeer 之外只依賴 msyql,依賴精簡,部署方便;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎層:當前滴滴 kafka 引擎版本是2.5,我們在此基礎上開發了一些自己的特性,如磁盤過載保護、IO線程池分離、Topic創建資源分配優化等功能,並且完全兼容開源社區的 0.10.X kafka版本;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網關層:引擎層之上滴滴設計了kafkaGateway網關層,提供了安全管控、topic 限流、服務發現、降級能力;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務層:基於kafkaGateway 我們在 Logi-KafkaManager 上提供了豐富的功能,主要有:topic管理、集羣監控、集羣管控能力;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺層:分別針對普通用戶和運維用戶,提供不同的功能集合,儘可能的將一些日常使用中的高頻操作在平臺上進行承接,降低用戶的使用成本,同時核心能力API化,方便用戶生態對接。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"寫在最後","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"項目開源只是萬里長征的第一步,產品還需要持續的打磨與建設,但行好事莫問前程,感謝那些曾經爲這個項目付出努力的童鞋們,特別是當前團隊的兄弟們,過去一年非常不容易,開源的技術夢想讓我們緊密的團結在一起,以此文向開源的領路人章文嵩致敬!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Logi-KafkaManager是2021年團隊開源夢想的一小步,是滴滴","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/-KQp-Qo3WKEOc9wIR2iFnw","title":null},"content":[{"type":"text","text":"Logi日誌服務套件","attrs":{}}]},{"type":"text","text":"整體開源計劃的重要組成部分,歡迎關注Obsuite公衆號或者加入Logi滴滴用戶釘釘羣,給我們的產品提出寶貴意見,推薦給身邊有需要的技術小夥伴。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bb/bbb815b09efbfffad2c4f038b581975d.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/32/32b183eeaa34b1ecba279ef8ed920f35.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章