【Sqoop】sqoop導入導出

本文簡單介紹Sqoop如下操作:

Import:

​ rdbms導入HDFS;

​ rdbms導入Hive;

​ rdbms導入Hbase;

Export:

​ Hive導出到rdbms;

​ Hbase導出到rdbms;

其中,rdbms以mysql數據庫爲例。

環境準備

mysql中建庫、建表並插入測試數據:

create database userdb;
use userdb;

create table emp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;

insert into emp(id,name,deg,salary,dept) values (1201,'gopal','manager',50000,'TP'),
(1202,'manisha','Proof reader',50000,'TP'),
(1203,'khalil','php dev',30000,'AC'),
(1204,'prasanth','php dev',30000,'AC'),
(1205,'kranthi','admin',20000,'TP');

一、Import

1. rdbms導入HDFS;

–target-dir: 指定目標路徑

–delete-tartget-dir:如果目標路徑存在,先刪除

–fields-terminated-by: 指定字段間的分隔符

1.1 全表導入

需求:導出emp表到HDFS的/sqoop/import/emp

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看導入數據:

hdfs dfs -cat /sqoop/import/emp/part*

1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

1.2 where條件導入

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--where "id=1202" \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看導入數據:

hdfs dfs -cat /sqoop/import/emp/part*

1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

1.3 query子查詢導入

當使用query導入時需要注意以下幾點:

(1) 無需–table 參數

(2)需要where條件

(3)需要$CONDITIONS,java程序用於拼接其他代碼

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--query 'select * from emp where 1=1 and $CONDITIONS' \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看導入數據:

hdfs dfs -cat /sqoop/import/emp/part*

1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

2. rdbms導入Hive;

拷貝Hive jar包到Sqoop 的lib目錄:

cp /hadoop/install/hive-1.1.0-cdh5.14.0/lib/hive-exec-1.1.0-cdh5.14.2.jar /hadoop/install/sqoop-1.4.6-cdh5.14.2/lib/

創建Hive表:

Hive表如不創建,導入時自動創建mysql中同構表。

大家可以根據實際情況創建,此文選擇不創建。

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--hive-import \
--hive-table test.emp_hive \
--fields-terminated-by ',' \
-m 1

查看導入數據:

select id, name, salary from test.emp_hive;

+-------+-----------+---------+--+
|  id   |   name    | salary  |
+-------+-----------+---------+--+
| 1201  | gopal     | 50000   |
| 1202  | manisha   | 50000   |
| 1203  | khalil    | 30000   |
| 1204  | prasanth  | 30000   |
| 1205  | kranthi   | 20000   |
+-------+-----------+---------+--+

3. rdbms導入Hbase;

修改sqoop-env.sh

#添加Hbase環境變量

#set the path to where bin/hbase is available
export HBASE_HOME=/hadoop/install/hbase-1.2.0-cdh5.14.2

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--columns "id,name,deg,salary,dept,create_time,update_time,is_delete" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "emp_hbase" \
--split-by id \
-m 1

查看導入數據:

scan 'emp_hbase'

二、 Export

1. Hive導出到rdbms;

HDFS中 /sqoop/import/emp/part-m-00000 如下數據:

1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

mysql創建表:

create table emp_exp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /sqoop/import/emp/part-m-00000 \
--input-fields-terminated-by ","

查看導入數據:

select id,name,salary from emp_exp;
+------+----------+--------+
| id   | name     | salary |
+------+----------+--------+
| 1205 | kranthi  |  20000 |
| 1201 | gopal    |  50000 |
| 1202 | manisha  |  50000 |
| 1203 | khalil   |  30000 |
| 1204 | prasanth |  30000 |
+------+----------+--------+

2. Hbase導出到rdbms;

因Sqoop不能識別Hfile,所以採用如下方式導出:

Hbase -> Hive外部表 -> Hive內部表 -> rdbms

創建Hive外部表:

create external table test.hbase2mysql
(id int,
 name string,
 deg string,
 salary int,
 dept string,
 create_time string,
 update_time string,
 is_delete int
 )
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:deg,info:salary,info:dept,info:create_time,info:update_time,info:is_delete")
tblproperties ("hbase.table.name" = "emp_hbase",
"hbase.mapred.output.outputtable" = "hbase2mysql");

創建Hive內部表,並插入數據:

create table test.hive2mysql as select * from test.hbase2mysql;

Sqoop命令行導入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /user/hive/warehouse/test.db/hive2mysql \
--input-fields-terminated-by '\001' \
--input-null-string '\\N' \
--input-null-non-string '\\N'

查看導入數據:

select id,name,salary from emp_exp;
+------+----------+--------+
| id   | name     | salary |
+------+----------+--------+
| 1205 | kranthi  |  20000 |
| 1201 | gopal    |  50000 |
| 1202 | manisha  |  50000 |
| 1203 | khalil   |  30000 |
| 1204 | prasanth |  30000 |
+------+----------+--------+

總結:

Sqoop具有豐富的參數,因篇幅原因,本文不能全部列舉。
實際運用中,可以使用sqoop import help或者sqoop export help 查看參數。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章