一:下載安裝
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -zxvf datax.tar.gz -C /usr/local/
二:
官方文檔https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md
三:簡單案例
(1)stream—>stream
[root@hadoop01 home]# cd /usr/local/datax/
[root@hadoop01 datax]# vi ./job/first.json
內容如下:
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"sliceRecordCount": 10,
"column": [
{
"type": "long",
"value": "10"
},
{
"type": "string",
"value": "hello,你好,世界-DataX"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}
運行job:
[root@hadoop01 datax]# python ./bin/datax.py ./job/first.json
(2)mysql—>hdfs
[root@hadoop01 datax]# vi ./job/mysql2hdfs.json
內容如下:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"id",
"name"
],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://hadoop01:3306/test"],
"table": ["stu"]
}
],
"username": "root",
"password": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"defaultFS": "hdfs://hadoop01:9000",
"fileType": "orc",
"path": "/datax/mysql2hdfs/orcfull",
"fileName": "m2h01",
"column": [
{
"name": "col1",
"type": "INT"
},
{
"name": "col2",
"type": "STRING"
}
],
"writeMode": "append",
"fieldDelimiter": "\t",
"compress":"NONE"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
注:
運行前,需提前創建好輸出目錄:
[root@hadoop01 datax]# hdfs dfs -mkdir -p /datax/mysql2hdfs/orcfull
運行job:
[root@hadoop01 datax]# python ./bin/datax.py ./job/mysql2hdfs.json
報錯:
ERROR RetryUtil - Exception when calling callable, 即將嘗試執行第1次重試.本次重試計劃等待[1000]ms,實際等待[1000]ms, 異常Msg:[DataX無法連接對應的數據庫,可能原因是:1) 配置的ip/port/database/jdbc錯誤,無法連接。2) 配置的username/password錯誤,鑑權失敗。請和DBA確認該數據庫的連接信息是否正確。]
datax裏面的mysql驅動更換成合適的8.x的版本就好了
因爲我的hdfs是ha的,所有嘗試配置了
“hadoopConfig”:{
“dfs.nameservices”: “hdfs://testDfs”,
“dfs.ha.namenodes.testDfs”:“namenode1,namenode2”,
“dfs.namenode.rpc-address.aliDfs.namenode1”: “hadoop01:9000”,
“dfs.namenode.rpc-address.aliDfs.namenode2”: “hadoop02:9000”,
“dfs.client.failover.proxy.provider.testDfs”: “org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”
}
但是會報java.io.IOException: Couldn’t create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
解決方法:用winrar把hdfs-site.xml,core-site.xml,hive-site.xml三個文件壓縮到datax/plugin/reader/hdfsreader/hdfsreader-0.0.1-SNAPSHOT.jar裏面
但我還沒嘗試