Sqoop 簡介
開源 工具
RDBMS---------------------------sqoop---------------------------->HDFS
Sqoop前:
RDBMS----->Hadoop
MR: DBinputformat------------TestOutputFormat
Hadoop------>RDBMS
MR:TestInputFormat--------->DBOutputFormat
MR存在的問題
- MapReduce麻煩
- 效率低(一個MR只能對應一個業務線)
基於MR存在的問題,抽取出一個框架,需要自定義:
- Driver
- username
- password
- url
- DB/table/sql
- hdfs path
- mapper’s
接入到框架之後
新的業務線接入只需要傳入參數遞給MR即可
- hadoop jar的方式來提交
- 動態的根據業務線傳入參數
後期可以採用Spring Boot微服務構建大數據平臺
Sqoop官方介紹
Apache Sqoop™ is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project: More information
Latest stable release is 1.4.7 (download, documentation). Latest cut of Sqoop2 is 1.99.7 (download, documentation). Note that 1.99.7 is not compatible with 1.4.7 and not feature complete, it is not intended for production deployment.
Sqoop : SQL - to - Hadoop
RDBMS <---------sqoop-----------> Hadoop(HDFS/Hive)
底層:一個讀寫操作,只需要map就能搞定 不需要reduce
Sqoop的兩個版本 1.X 2.X(1.99.X)
Sqoop 1 架構圖
只用到了Map task ,沒用到Recude
Sqoop 2(1.99.x)架構圖
recude也用到了
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/bin/../lib/sqoop/../accumulo does not exist
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/sqoop/
解壓後放到 sqoop home 的 lib 文件夾下
Sqoop1 使用教程
基本操作
列出數據庫
sqoop list-databases --connect jdbc:mysql://10.103.66.88:3306 --username name --password password
列出表
sqoop list-tables --connect jdbc:mysql://10.103.66.88:3306/information_schema --username
table導入到HDFS
sqoop import \
--connect jdbc:mysql://10.103.66.88:3306/lenovosbom \
--username xingwj1 \
--password xingwj1 \
--table ec
由於MySQL表中沒有主鍵,出現了錯誤
需要用 --split-by 指定主鍵
或者是 -m 1 順序導入
sqoop import \
--connect jdbc:mysql://10.103.66.88:3306/lenovosbom \
--username xingwj1 \
--password xingwj1 \
--table ec \
-m 1