DataX KafkaWriter 插件文檔

DataX KafkaWriter 插件文檔

最近學習使用datax工具, 發現阿里官方提供並沒有kafkawriter插件,於是自己寫了一個
該插件主要借鑑:datax插件開發寶典
然後在基礎上改造

源碼:https://gitee.com/mjlfto/dataX/tree/master/kafkawriter

1 快速介紹

KakfaWriter提供向kafka中指定topic寫數據。

2 功能與限制

目前kafkaWriter支持向單個topic中寫入文本類型數據或者json格式數據

3 功能說明

3.1 配置樣例

{  
   "job":{  
      "setting":{  
         "speed":{  
            "channel":1
         }
      },
      "content":[  
         {  
            "reader":{  
               "name":"oraclereader",
               "parameter":{  
                  "username":"zkcj",
                  "password":"zkcj2018",
                  "connection":[  
                     {  
                        "jdbcUrl":[  
                           "jdbc:oracle:thin:@10.1.20.169:1521:GYJG"
                        ],
                        "querySql":[  
                           "select * from VM_DRV_PREASIGN_A"
                        ]
                     }
                  ]
               }
            },
            "writer":{ 
		"name": "kafkawriter",
			  "parameter": {
			    "topic": "test-topic",
			    "bootstrapServers": "10.1.20.150:9092",
			    "fieldDelimiter":"\t",
			    "batchSize":10,
			     "writeType":"json",
			    "notTopicCreate":true,
			    "topicNumPartition":1,
			    "topicReplicationFactor":1
			  }
		 }
         }
      ]
   }
}

3.2 參數說明

  • bootstrapServers

    • 描述:kafka服務地址,格式:host1:port,host2:port 樣例:10.1.20.111:9092,10.1.20.121:9092

    • 必選:是

    • 默認值:無

  • topic

    • 描述:kafka Topic 名稱, 目前支持一次寫入單個topic

    • 必選:是

    • 默認值:無

  • ack

    • 描述:消息的確認機制,默認值是0

      acks=0:如果設置爲0,生產者不會等待kafka的響應。
      acks=1:這個配置意味着kafka會把這條消息寫到本地日誌文件中,但是不會等待集羣中其他機器的成功響應。
      acks=all:這個配置意味着leader會等待所有的follower同步完成。這個確保消息不會丟失,除非kafka集羣中所有機器掛掉。這是最強的可用性保證。

    • 必選:否

    • 默認值:0

  • batchSize

    • 描述:當多條消息需要發送到同一個分區時,生產者會嘗試合併網絡請求。這會提高client和生產者的效率

      默認值:16384

    • 必選:否

    • 默認值:16384

  • retries

    • 描述:配置爲大於0的值的話,客戶端會在消息發送失敗時重新發送:

      默認值:0

    • 必選:否

    • 默認值:0

  • fieldDelimiter

    • 描述:當wirteType爲text時,寫入時的字段分隔符

      默認值:,(逗號)

    • 必選:否

    • 默認值:,

  • keySerializer

    • 描述:鍵序列化,默認org.apache.kafka.common.serialization.StringSerializer

    • 必選:否

    • 默認值:org.apache.kafka.common.serialization.StringSerializer

  • valueSerializer

    • 描述:鍵序列化,默認org.apache.kafka.common.serialization.StringSerializer

    • 必選:否

    • 默認值:org.apache.kafka.common.serialization.StringSerializer

  • noTopicCreate

    • 描述:當沒有topic時,是否創建topic,默認false

    • 必選:haveKerberos 爲true必選

    • 默認值:false

  • topicNumPartition

    • 描述:topic Partition 數量

    • 必選:否

    • 默認值:1

  • topicReplicationFactor

    • 描述:topic replication 數量

    • 必選:否

    • 默認值:1

  • writeType

    • 描述:寫入到kafka中的數據格式,可選text, json

      text:使用fieldDelimiter拼接所有字段值作爲key,value相同,然後寫到kafka
      json:key和text格式相同,使用fieldDelimiter拼接所有字段值作爲key,value使用datx內部column格式, 如下
      rawData爲數據值,如果對象中沒有該字段, 表示該值爲null

        {  
           "data":[  
              {  
                 "byteSize":13,
                 "rawData":"xxxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"1",
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"A",
                 "type":"STRING"
              },
              {  
                 "byteSize":18,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":3,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"A",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426740491000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426780800000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":1,
                 "rawData":"E",
                 "type":"STRING"
              },
              {  
                 "byteSize":7,
                 "rawData":"5201009",
                 "type":"STRING"
              },
              {  
                 "byteSize":6,
                 "rawData":"520101",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":3,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"520181000400",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":78,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"STRING"
              },
              {  
                 "byteSize":8,
                 "rawData":1426694400000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"520181000400",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"1",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426740491000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":2,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":28,
                 "rawData":"YxIC7zeM6xG+eBdzxV4oRDxHses=",
                 "type":"STRING"
              }
           ],
           "size":40
        }
      
    • 必選:否

    • 默認值:text

3.3 類型轉換

目前 HdfsWriter 支持大部分 Hive 類型,請注意檢查你的類型。

下面列出 HdfsWriter 針對 Hive 數據類型轉換列表:

DataX 內部類型 HIVE 數據類型
Long TINYINT,SMALLINT,INT,BIGINT
Double FLOAT,DOUBLE
String STRING,VARCHAR,CHAR
Boolean BOOLEAN
Date DATE,TIMESTAMP

4 配置步驟

5 約束限制

6 FAQ

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章