sqoop的安裝及簡單使用

一、說明

1、sqoop是sql to hadoop的縮寫。

2、連接傳統關係型數據庫和Hadoop的橋樑,把關係型數據庫的數據導入到 Hadoop 系統 ( 如 HDFS、HBase 和 Hive) 中; 把數據從 Hadoop 系統裏抽取並導出到關係型數據庫中;

3、利用MapReduce,批處理方式進行數據傳輸

4、sqoop有兩個版本,sqoop1(1.4.x)和sqoop2(1.99.x);因爲官方並不建議在生產環境中使用sqoop2,所以這裏使用的是sqoop1;

5、sqoop1底層是通過MapReduce來實現的,而且是隻有map沒有reduce的;

6、sqoop的執行是要提交yarn的;

7、本篇測試前要啓動hdoop(包括yarn),mysql;

二、sqoop1的安裝

1、下載位置

http://mirror.bit.edu.cn/apache/sqoop/1.4.7/

2、將下載的安裝包放到linux某一目錄下並解壓重命名

[root@master sqoop]# pwd
/opt/softWare/sqoop
[root@master sqoop]# ls
sqoop-1.4.7  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7

3、配置環境變量

vim /etc/profile
 #sqoop
 export SQOOP_HOME=/opt/softWare/sqoop/sqoop-1.4.7
 export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile

4、修改sqoop的配置文件

cd conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
 #Set the path for where zookeper config dir is
 #export ZOOCFGDIR=
 export HADOOP_COMMON_HOME=/opt/softWare/hadoop/hadoop-2.7.3
 export HADOOP_MAPRED_HOME=/opt/softWare/hadoop/hadoop-2.7.3
 export HIVE_HOME=/opt/softWare/hive/apache-hive-2.1.1-bin
 export HBASE_HOME=/opt/softWare/hbase/hbase-1.2.6

在配置文件最後一行增加上面內容;

5、將mysql的驅動包上傳到sqoop的lib目錄下

mysql-connector-java-5.1.47.jar

6、查看sqoop的命令

[root@master bin]# ./sqoop help
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:43:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

7、使用sqoop查詢mysql中的數據庫

[root@master bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.230.21:3306/test?characterEncoding=UTF-8 --username root --password '123456'
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:44:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 10:44:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 10:44:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
information_schema
hive
ke
mysql
performance_schema
test

8、mysql下的stu的表數據爲

mysql> select * from stu;
+----------+------+----------+
| name     | age  | address  |
+----------+------+----------+
| zhangsan |   20 | henan    |
| lisi     |   20 | hebei    |
| wangwu   |   20 | beijing  |
| liuqi    |   20 | shandong |
| xuwei    |   20 | fujian   |
+----------+------+----------+
5 rows in set (0.00 sec)

9、將mysql下的stu表數據導入到hdfs

[root@master bin]# ./sqoop import -m 1 --connect jdbc:mysql://192.168.230.21:3306/mysql --username root --password 123456 --table stu --target-dir /user/sqoop/datatest
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 09:52:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 09:52:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 09:52:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/06/29 09:52:29 INFO tool.CodeGenTool: Beginning code generation
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/softWare/hadoop/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/06/29 09:52:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.jar
20/06/29 09:52:31 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/06/29 09:52:31 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/06/29 09:52:31 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/06/29 09:52:31 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/06/29 09:52:31 INFO mapreduce.ImportJobBase: Beginning import of stu
20/06/29 09:52:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/06/29 09:52:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/06/29 09:52:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.230.21:8032
20/06/29 09:52:46 INFO db.DBInputFormat: Using read commited transaction isolation
20/06/29 09:52:46 INFO mapreduce.JobSubmitter: number of splits:1
20/06/29 09:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1593394152340_0001
20/06/29 09:52:48 INFO impl.YarnClientImpl: Submitted application application_1593394152340_0001
20/06/29 09:52:48 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1593394152340_0001/
20/06/29 09:52:48 INFO mapreduce.Job: Running job: job_1593394152340_0001
20/06/29 09:53:00 INFO mapreduce.Job: Job job_1593394152340_0001 running in uber mode : false
20/06/29 09:53:00 INFO mapreduce.Job:  map 0% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job:  map 100% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job: Job job_1593394152340_0001 completed successfully
20/06/29 09:53:12 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=138019
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=84
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7657
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=7657
		Total vcore-milliseconds taken by all map tasks=7657
		Total megabyte-milliseconds taken by all map tasks=7840768
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=77
		CPU time spent (ms)=990
		Physical memory (bytes) snapshot=114147328
		Virtual memory (bytes) snapshot=2082172928
		Total committed heap usage (bytes)=42663936
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=84
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Transferred 84 bytes in 39.4567 seconds (2.1289 bytes/sec)
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Retrieved 5 records.

注意:-m:表明需要使用幾個map任務併發執行

10、查看效果

[root@master ~]# hdfs dfs -ls /user/sqoop/datatest
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-06-29 09:53 /user/sqoop/datatest/_SUCCESS
-rw-r--r--   1 root supergroup         84 2020-06-29 09:53 /user/sqoop/datatest/part-m-00000
[root@master ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-0000
cat: `/user/sqoop/datatest/part-m-0000': No such file or directory
[root@master ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-00000
zhangsan,20,henan
lisi,20,hebei
wangwu,20,beijing
liuqi,20,shandong
xuwei,20,fujian
[root@master ~]#

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章