DataX KafkaWriter 插件文檔
最近學習使用datax工具, 發現阿里官方提供並沒有kafkawriter插件,於是自己寫了一個
該插件主要借鑑:datax插件開發寶典
然後在此基礎上改造
源碼:https://gitee.com/mjlfto/dataX/tree/master/kafkawriter
1 快速介紹
KakfaWriter提供向kafka中指定topic寫數據。
2 功能與限制
目前kafkaWriter支持向單個topic中寫入文本類型數據或者json格式數據
3 功能說明
3.1 配置樣例
{
"job":{
"setting":{
"speed":{
"channel":1
}
},
"content":[
{
"reader":{
"name":"oraclereader",
"parameter":{
"username":"zkcj",
"password":"zkcj2018",
"connection":[
{
"jdbcUrl":[
"jdbc:oracle:thin:@10.1.20.169:1521:GYJG"
],
"querySql":[
"select * from VM_DRV_PREASIGN_A"
]
}
]
}
},
"writer":{
"name": "kafkawriter",
"parameter": {
"topic": "test-topic",
"bootstrapServers": "10.1.20.150:9092",
"fieldDelimiter":"\t",
"batchSize":10,
"writeType":"json",
"notTopicCreate":true,
"topicNumPartition":1,
"topicReplicationFactor":1
}
}
}
]
}
}
3.2 參數說明
-
bootstrapServers
-
描述:kafka服務地址,格式:host1:port,host2:port 樣例:10.1.20.111:9092,10.1.20.121:9092
-
必選:是
-
默認值:無
-
-
topic
-
描述:kafka Topic 名稱, 目前支持一次寫入單個topic
-
必選:是
-
默認值:無
-
-
ack
-
描述:消息的確認機制,默認值是0
acks=0:如果設置爲0,生產者不會等待kafka的響應。
acks=1:這個配置意味着kafka會把這條消息寫到本地日誌文件中,但是不會等待集羣中其他機器的成功響應。
acks=all:這個配置意味着leader會等待所有的follower同步完成。這個確保消息不會丟失,除非kafka集羣中所有機器掛掉。這是最強的可用性保證。
-
必選:否
-
默認值:0
-
-
batchSize
-
描述:當多條消息需要發送到同一個分區時,生產者會嘗試合併網絡請求。這會提高client和生產者的效率
默認值:16384 -
必選:否
-
默認值:16384
-
-
retries
-
描述:配置爲大於0的值的話,客戶端會在消息發送失敗時重新發送:
默認值:0
-
必選:否
-
默認值:0
-
-
fieldDelimiter
-
描述:當wirteType爲text時,寫入時的字段分隔符
默認值:,(逗號)
-
必選:否
-
默認值:,
-
-
keySerializer
-
描述:鍵序列化,默認org.apache.kafka.common.serialization.StringSerializer
-
必選:否
-
默認值:org.apache.kafka.common.serialization.StringSerializer
-
-
valueSerializer
-
描述:鍵序列化,默認org.apache.kafka.common.serialization.StringSerializer
-
必選:否
-
默認值:org.apache.kafka.common.serialization.StringSerializer
-
-
noTopicCreate
-
描述:當沒有topic時,是否創建topic,默認false
-
必選:haveKerberos 爲true必選
-
默認值:false
-
-
topicNumPartition
-
描述:topic Partition 數量
-
必選:否
-
默認值:1
-
-
topicReplicationFactor
-
描述:topic replication 數量
-
必選:否
-
默認值:1
-
-
writeType
-
描述:寫入到kafka中的數據格式,可選text, json
text:使用fieldDelimiter拼接所有字段值作爲key,value相同,然後寫到kafka
json:key和text格式相同,使用fieldDelimiter拼接所有字段值作爲key,value使用datx內部column格式, 如下
rawData爲數據值,如果對象中沒有該字段, 表示該值爲null{ "data":[ { "byteSize":13, "rawData":"xxxx", "type":"STRING" }, { "byteSize":1, "rawData":"1", "type":"STRING" }, { "byteSize":12, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"A", "type":"STRING" }, { "byteSize":18, "rawData":"xxx", "type":"STRING" }, { "byteSize":3, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"A", "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":8, "rawData":1426740491000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":8, "rawData":1426780800000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":1, "rawData":"E", "type":"STRING" }, { "byteSize":7, "rawData":"5201009", "type":"STRING" }, { "byteSize":6, "rawData":"520101", "type":"DOUBLE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":3, "rawData":"xxx", "type":"STRING" }, { "byteSize":12, "rawData":"520181000400", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":0, "subType":"DATETIME", "type":"DATE" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":78, "rawData":"xxx", "type":"STRING" }, { "byteSize":1, "rawData":"0", "type":"STRING" }, { "byteSize":8, "rawData":1426694400000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "subType":"DATETIME", "type":"DATE" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":0, "type":"DOUBLE" }, { "byteSize":1, "rawData":"0", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":12, "rawData":"520181000400", "type":"STRING" }, { "byteSize":1, "rawData":"1", "type":"DOUBLE" }, { "byteSize":1, "rawData":"0", "type":"DOUBLE" }, { "byteSize":8, "rawData":1426740491000, "subType":"DATETIME", "type":"DATE" }, { "byteSize":2, "rawData":"xxx", "type":"STRING" }, { "byteSize":0, "type":"STRING" }, { "byteSize":28, "rawData":"YxIC7zeM6xG+eBdzxV4oRDxHses=", "type":"STRING" } ], "size":40 }
-
必選:否
-
默認值:text
-
3.3 類型轉換
目前 HdfsWriter 支持大部分 Hive 類型,請注意檢查你的類型。
下面列出 HdfsWriter 針對 Hive 數據類型轉換列表:
DataX 內部類型 | HIVE 數據類型 |
---|---|
Long | TINYINT,SMALLINT,INT,BIGINT |
Double | FLOAT,DOUBLE |
String | STRING,VARCHAR,CHAR |
Boolean | BOOLEAN |
Date | DATE,TIMESTAMP |
4 配置步驟
5 約束限制
略
6 FAQ
略