RocketMQ Streams:將輕量級實時計算引擎融合進消息系統

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者 | 袁小棟、程君傑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"審覈校對 | 杜恆、歲月、白璵、不周"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着各行各業移動互聯和雲計算技術的普及發展,大數據計算已深入人心,最常見的比如flink、spark等。這些大數據框架,採用中心化的Master-Slave架構,依賴和部署比較重,每個任務也有較大開銷,有較大的使用成本。RocketMQ Streams着重打造輕量計算引擎"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#171A1D","name":"user"}}],"text":",除了消息隊列,無額外依賴,對過濾場景做了大量優化,性能提升3-5倍,資源節省50%-80%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams適合大數據量->高過濾->輕窗口計算的場景,核心打造輕資源,高性能優勢,在資源敏感場景中有很大優勢,最低1core,1g可部署,建議的應用場景(安全,風控,邊緣計算,消息隊列流計算)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams兼容Blink(Flink的阿里內部版本) 的SQL,UDF\/UDTF\/UDAF,多數Blink任務可以直接遷移成RocketMQ Streams任務。將來還會發布和Flink的融合版本,RocketMQ Streams可以直接發佈成Flink任務,既可以享有RocketMQ Streams帶來的高性能,輕資源,還可以和現有的Flink任務統一運維和管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本篇文章主要從五個方面來介紹RocketMQ Streams實時計算平臺:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先簡單先介紹一下什麼是RocketMQ Streams;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二部分,基於RocketMQ Streams的SDK,來了解下它是怎麼去使用的;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分,RocketMQ Streams整體的架構以及它的原理實現;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四部分,在雲安全的場景下該怎麼使用RocketMQ Streams;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第五部分,RocketMQ Streams的未來規劃。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什麼是RocketMQ Streams?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本章節從基礎簡介、設計思路和特點三方面對RocketMQ streams進行整體介紹。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 RocketMQ Streams簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)首先,它是一個Lib包,啓動即運行,和業務直接集成;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)然後,它具備SQL引擎能力,兼容Blink SQL語法,兼容Blink UDF\/UDTF\/UDAF;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)其次,它包含ETL引擎,可以無編碼實現數據的ETL,過濾和轉存;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)最後,它基於數據開發SDK,大量實用組件可直接使用,如:Source、sink、script、filter、lease、scheduler、configurable不侷限流的場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/aa\/aa244c6ab90502ab0237c57e662295e7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 RocketMQ Streams的特點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams基於上述的實現思路,可以看到它有以下幾個特點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.1 輕量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1核1g就可以部署,依賴較輕,在測試場景下用Jar包直接寫個main方法就可以運行,在正式環境下最多依賴消息隊列和存儲(其中存儲是可選的,主要是爲了分片切換時的容錯)。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.2 高性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實現高過濾優化器,包括前置指紋過濾,同源規則自動歸併,hyperscan加速,表達式指紋等,比優化前性能提升3-5倍,資源節省50%以上。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.3 維表 JOIN(千萬數據量維表支持)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設計高壓縮內存存儲數據,無java頭部和對齊的開銷,存儲接近原始數據大小,純內存操作,性能最大化,同時對於Mysql提供了多線程併發加載,提高加載維表的速度。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.4 高擴展的能力"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Source可按需擴展,已實現:RocketMQ,File,Kafka;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sink可按需擴展,已實現:RocketMQ,File,Kafka,Mysql,ES;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可按Blink規範擴展 UDF\/UDTF\/UDAF;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提供了更輕的UDF\/UDTF擴展能力,不需要任何依賴就可以完成函數的擴展。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.5 提供了豐富的大數據的能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"包括精確計算一次靈活的窗口,雙流join,統計,開窗,各種轉換過濾,滿足大數據開發的各種場景,支持彈性容錯的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、RocketMQ Streams的使用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams 對外提供兩種SDK,一種是DSL SDK,一種是SQL SDK,用戶可以按需選擇; DSL SDK支持實時場景DSL語義; SQL SDK 兼容Blink(Flink的阿里內部版本) SQL的語法,多數Blink SQL可以通過RocketMQ Streams運行;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,我們詳細的介紹一下這兩種SDK。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 環境要求"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" JDK1.8 版本以上;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Maven 3.2版本以上。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 DSL SDK"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用DSL SDK開發實時任務時,需要做如下的一些準備工作:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1 依賴準備"}]},{"type":"codeblock","attrs":{"lang":"xml"},"content":[{"type":"text","text":"\n org.apache.rocketmq\n rocketmq-streams-clients\n 1.0.0-SNAPSHOT\n\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"準備工作完成後,就可以直接開發自己的實時程序。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2 代碼開發"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"DataStreamSource source=StreamBuilder.dataStream(\"namespace\",\"pipeline\");\n\nsource.fromFile(\"~\/admin\/data\/text.txt\",false)\n .map(message->message + \"--\")\n .toPrint(1)\n .start();\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Namespace是業務隔離的,相同的業務可以寫成相同的Namespace。相同的Namespace在任務調度裏可以跑在進程裏,也可以共享一些配置;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)pipelineName可以理解成就是job name ,唯一區分job;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)DataStreamSource主要是創建Source,然後這個程序運行起來,最終的結果就是在原始的消息裏面會加\"--\",然後把它打印出來。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.3 豐富的算子"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams提供了豐富的算子, 包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"source算子:包括fromFile, fromRocketMQ, fromKafka 以及可以自定義source來源的from算子;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"sink 算子: 包括toFile, toRocketMQ, toKafka,toDB,toPrint, toES 以及可以自定義sink的to算子;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"action算子:包括Filter,Expression,Script,selectFields,Union,forEach,Split,Select,Join,Window 等多個算子。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.4 部署執行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於DSL SDK完成開發,通過下面命令打成jar包,執行jar,或直接執行任務的main方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"mvn -Prelease-all -DskipTests clean install -U\njava -jar jarName mainClass &\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3 SQL SDK"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.1 依賴準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#595959","name":"user"}}],"text":"  "}]},{"type":"codeblock","attrs":{"lang":"xml"},"content":[{"type":"text","text":" \n com.alibaba\n rsqldb-clients\n 1.0.0-SNAPSHOT\n\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.2 代碼開發"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先開發業務邏輯代碼, 可以保存爲文件也可以直接使用文本;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"CREATE FUNCTION json_concat as 'xxx.xxx.JsonConcat';\n\nCREATE TABLE `table_name` (\n `scan_time` VARCHAR,\n `file_name` VARCHAR,\n `cmdline` VARCHAR,\n) WITH (\n type='file',\n filePath='\/tmp\/file.txt',\n isJsonData='true',\n msgIsJsonArray='false'\n);\n\n\n-- 數據標準化\n\ncreate view data_filter as\nselect\n *\nfrom (\n select\n scan_time as logtime\n , lower(cmdline) as lower_cmdline\n , file_name as proc_name\n from\n table_name\n)x\nwhere\n (\n lower(proc_name) like '%.xxxxxx'\n or lower_cmdline like 'xxxxx%'\n or lower_cmdline like 'xxxxxxx%'\n or lower_cmdline like 'xxxx'\n or lower_cmdline like 'xxxxxx'\n )\n;\n\nCREATE TABLE `output` (\n `logtime` VARCHAR\n , `lower_cmdline` VARCHAR\n , `proc_name` VARCHAR\n) WITH (\n type = 'print'\n);\n\ninsert into output\nselect\n *\nfrom\n aegis_log_proc_format_raw\n;\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" CREATE FUNCTION:引入外部的函數來支持業務邏輯, 包括flink以及系統函數;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE Table:創建source\/sink;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE VIEW:執行字段轉化,拆分,過濾;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"INSERT INTO:數據寫入sink;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"函數:內置函數,udf函數。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.3 SQL擴展"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams支持三種SQL擴展能力,具體實現細節請看:"},{"type":"link","attrs":{"href":"rsqldb","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/alibaba\/rsqldb"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)通過Blink UDF\/UDTF\/UDAF擴展SQL能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)通過RocketMQ streams擴展SQL能力,只要實現函數名是eval的java bean即可;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)通過現有java代碼擴展SQL能力,create function 函數名就是java類的方法名。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.4 SQL執行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你可以從"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/rocketmq-streams","title":null,"type":null},"content":[{"type":"text","text":"這裏"}]},{"type":"text","text":"下載最新的Rocketmq Streams代碼並構建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"cd rsqldb\/\nmvn -Prelease-all -DskipTests clean install -U\ncp rsqldb-runner\/target\/rocketmq-streams-sql-{版本號}-distribution.tar.gz 部署的目錄\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解壓tar.gz包, 進入目錄結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"tar -xvf rocketmq-streams-{版本號}-distribution.tar.gz\ncd rocketmq-streams-{版本號}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其目錄結構如下 "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" bin 指令目錄,包括啓動和停止指令"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"conf 配置目錄,包括日誌配置以及應用的相關配置文件"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"jobs 存放sql,可以兩級目錄存儲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ext 存放擴展的UDF\/UDTF\/UDAF\/Source\/Sink"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"lib 依賴包目錄"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"log 日誌目錄"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.1 執行SQL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"#指定sql的路徑,啓動實時任務\nbin\/start-sql.sh sql_file_path\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.2 執行多個SQL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果想批量執行一批SQL,可以把SQL放到jobs目錄,最多可以有兩層,把sql放到對應目錄中,通過start指定子目錄或sql執行任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.3 任務停止"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"# 停止過程不加任何參數,則會將目前所有運行的任務同時停止\nbin\/stop.sh\n\n# 停止過程添加了任務名稱, 則會將目前運行的所有同名的任務都全部停止\nbin\/stop.sh sqlname\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.4 日誌查看"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前所有的運行日誌都會存儲在 log\/catalina.out文件中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、架構設計及原理分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 RocketMQ Streams設計思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在瞭解完RocketMQ streams的基本簡介,接下來,我們看下RocketMQ streams的設計思路,設計思路主要從設計目標和策略兩個方面來介紹:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1.1 設計目標"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依賴少,部署簡單,1核1g單實例可部署,可隨意擴展規模;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打造場景優勢,重點打造大數據量->高過濾->輕窗口計算的場景,功能覆蓋度要全,實現需要的大數據特性:Exactly-ONCE、靈活的窗口(滾動、滑動、會話窗口);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要在保持低資源的前提下,對高過濾有性能突破,打造性能優勢;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容Blink SQL,UDF\/UDTF\/UDAF,讓非技術人員更容易上手。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1.2 策略(適配場景:大數據量>高過濾\/ETL>低窗口計算)"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用shared-nothing的分佈式架構設計,依賴消息隊列做負載均衡和容錯機制,單實例可啓動,增加實例實現能力擴展,併發能力取決於分片數;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用消息隊列的分片做shuffle,利用消息隊列負載均衡實現容錯;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用存儲實現狀態備份,實現Exactly-ONCE的語義。用結構化遠程存儲實現快速啓動,不等本地存儲恢復。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重力打造過濾優化器,通過前置指紋過濾,同源規則自動歸併,hyperscan加速,表達式指紋提高過濾性能"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e7\/e76d95a0030f0a2019289afdc75e3d0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 RocketMQ Streams Source的實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Source要求實現最少消費一次的語義,系統通過checkpoint系統消息實現,在提交offset前發送checkpoint消息,通知所有算子刷新內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)Source支持分片的自動負載和容錯"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據源在分片移除時,發送移除系統消息,讓算子完成分片清理工作;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當有新分片時,發送新增分片消息,讓算子完成分片初始化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)數據源通過start方法,啓動consuemr獲取消息;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)原始消息經過編碼,附加頭部信息包裝成Message投遞給後續算子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1c\/1cca76f0483c7f580841a527f0cd2407.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3 RocketMQ Streams Sink的實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Sink是實時性和吞吐的一個結合;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)實現一個sink只要繼承AbstractSink類實現batchInsert方法即可。batchInsert的含義是一批數據寫入存儲,需要子類調用存儲接口實現,儘量應用存儲的批處理接口,提高吞吐;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)常規的使用方式是寫message->cache->flush->存儲的方式,系統會嚴格保證每次批次寫入存儲的量不超過batchsize的量,如果超過了,會拆分成多批寫入;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/51\/51f673ce31b340b21dee51196efe9441.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)Sink有一個cache,數據默認寫cache,批次寫入存儲,提高吞吐(一個分片一個cache);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)可以開啓自動刷新,每個分片會有一個線程,定時刷新cache數據到存儲,提高實時性。實現類:DataSourceAutoFlushTask;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6)通過調用flush方法刷新cache到存儲;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7)Sink的cache會有內存保護,當cache的消息條數>batchSize,會強制刷新,釋放內存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.4 RocketMQ Streams Exactly-ONCE實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Source確保在commit offset時,會發送checkpoint系統消息,收到消息的組件會完成存盤操作,消息至少消費一次;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)每條消息會有消息頭部,裏面封裝了queueld和offset;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)組件在存儲數據時,會把queueld和處理的最大offset存儲下來,當有消息重複時,根據maxoffset去重;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)內存保護,一個checkpoint週期可能有多次flush(條數觸發),保障內存佔用可控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/27\/27debc75c6657bf3f0a341799a68a434.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.5 RocketMQ Streams Window"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實現方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)支持滾動、滑動和會話窗口,支持事件時間和自然時間(消息進入算子的時間);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)支持Emit語法,可以在觸發前或觸發後,每隔n段時間,更新一次數據;比如1小時窗口,窗口觸發前希望每分鐘看到最新結果,窗口觸發後希望不丟失遲到一天內的數據,且每10分鐘更新數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)支持高性能模式和高可靠模式,高性能模式不依賴遠程存儲,但在分片切換時,有丟失窗數據的風險;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)快速啓動,無需等待本地存儲恢復,在發生錯誤或分片切換時,異步從遠程存儲恢復數據,同時直接訪問遠程存儲計算;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)利用消息隊列負載均衡,實現擴容縮容容,每個queue是一份組,一個分組同一刻只被一臺機器消費;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6)正常計算依賴本地存儲,具備flink相似的計算性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/224a8430a195476c08baf70a13b49b32.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、RocketMQ Streams在安全場景的最佳實踐"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從公共雲轉戰專有云,遇到了新的問題。因爲專有云像大數據這種saas服務是非必須輸出的,且最小輸出規模也比較大,用戶成本會增加很多,難落地,導致安全能力無法快速同步到專有云。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cd\/cd110af3ee715c3fe204f55115bf88ba.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 解決辦法"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"RocketMQ Streams在雲安全的應用-流計算"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於安全場景打造輕量級計算引擎,基於安全高過濾的場景特點,可以針對高過濾場景優化,然後再做較重的統計、窗口、join操作,因爲過濾率比較高,可以用更輕的方案實現統計和join操作;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SQL和引擎都可熱升級"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95079fc049905f6f608e698480d6e7a5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"業務結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"1)規則覆蓋:自建引擎,覆蓋100%規則(正則,join,統計);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2)輕資源,內存是公共雲引擎的1\/24,cpu是1\/6,依賴過濾優化器,資源不隨規則線性增加,新增規則無資源壓力,通過高壓縮表,支持千萬情報;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"3)SQL發佈,通過c\/s部署模式,SQL引擎熱發佈,尤其護網場景,可快速上線規則;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"4)性能優化,對核心組件進行專題性能優化,保持高性能,每實例(2g,4核,41規則)5000qps以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、RocketMQ Streams的未來規劃"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.1 打造RocketMQ一體化計算能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)和RocketMQ整合,去除DB依賴,融合RocketMQ KV;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)和RocketMQ混部,支持本地計算,利用本地特點,打造高性能;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)打造邊緣計算最佳實踐"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.2 Connector增強"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)支持pull消費方式,checkpoint異步刷新;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)兼容blink\/flink connector。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.3 ETL能力建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)增加文件,syslog的數據接入能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)兼容Grok解析,增加常用日誌的解析能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)打造日誌ETL 的最佳實踐"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.4 穩定性和易用性打造"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Window多場景測試,提升穩定性,性能優化;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)補充測試用例,文檔,應用場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、開源地址"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ-Streams: "},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/rocketmq-streams","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/apache\/rocketmq-streams"}]},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ-Streams-SQL:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/alibaba\/rsqldb","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/alibaba\/rsqldb"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上是本次對RocketMQ stream的整體介紹,希望對大家有所幫助和啓發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章