Hadoop數據傳輸工具sqoop(三)用Sqoop導入數據到HIVE

一、安裝Hive

1.1下載解壓

下載apache-hive-0.13.1-bin.tar.gz

$ tar zxvf apache-hive-0.13.1-bin.tar.gz

1.2配置環境變量

在/etc/profile中添加:

export HIVE_HOME=/usr/local/app/hadoop/hive-0.13.1-bin
export PATH=$HIVE_HOME/bin:$PATH

1.3建立Hive倉庫目錄

$ hadoop fs -mkdir/tmp
$ hadoop fs -mkdir/user/hive/warehouse
$ hadoop fs -chmodg+w /tmp
$ hadoop fs -chmodg+w /user/hive/warehouse

1.4啓動命令行

通過hive命令進入命令行,操作與MySQL的命令行類似:


二、安裝Sqoop

請見《Hadoop數據傳輸工具sqoop(一)簡介 》


三、用Sqoop導入數據到HIVE

3.1導入HDFS

我們從MySQL數據庫中導入一張表的數據來測試一下Sqoop是否配置成功。首先上傳mysql-connector-java-5.1.23.jar到sqoop的lib文件夾下,然後執行下列命令:

$ sqoop import --connect jdbc:mysql://ip/database --table tb1 --username user -P

Warning: /usr/lib/hbase does not exist!HBase imports will fail.
Please set $HBASE_HOME to the root of yourHBase installation.
Enter password:

13/06/07 16:51:46 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/07 16:51:46 INFO tool.CodeGenTool: Beginning codegeneration
13/06/07 16:51:48 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `tb1` AS t LIMIT 1
13/06/07 16:51:48 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `tb1` AS t LIMIT 1
13/06/07 16:51:48 INFOorm.CompilationManager: HADOOP_MAPRED_HOME is /home/admin/hadoop-0.20.2
13/06/07 16:51:48 INFOorm.CompilationManager: Found hadoop core jar at:/home/admin/hadoop-0.20.2/hadoop-0.20.2-core.jar
Note:/tmp/sqoop-root/compile/44c4b6c5ac57de04b487eb90633ac33e/tb1.java uses oroverrides a deprecated API.
Note: Recompile with -Xlint:deprecation fordetails.
13/06/07 16:51:54 INFO orm.CompilationManager:Writing jar file:/tmp/sqoop-root/compile/44c4b6c5ac57de04b487eb90633ac33e/tb1.jar
13/06/07 16:51:54 WARNmanager.MySQLManager: It looks like you are importing from mysql.
13/06/07 16:51:54 WARNmanager.MySQLManager: This transfer can be faster! Use the --direct
13/06/07 16:51:54 WARNmanager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/07 16:51:54 INFOmanager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/06/07 16:51:54 INFO mapreduce.ImportJobBase:Beginning import of tb1
13/06/07 16:51:57 INFOdb.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM`tb1`
13/06/07 16:51:59 INFO mapred.JobClient:Running job: job_201306071651_0001
13/06/07 16:52:00 INFOmapred.JobClient:  map 0% reduce 0%
13/06/07 16:52:38 INFOmapred.JobClient:  map 50% reduce 0%
13/06/07 16:52:44 INFOmapred.JobClient:  map 100% reduce 0%
13/06/07 16:52:46 INFO mapred.JobClient:Job complete: job_201306071651_0001
13/06/07 16:52:46 INFO mapred.JobClient:Counters: 5
13/06/07 16:52:46 INFOmapred.JobClient:   Job Counters
13/06/07 16:52:46 INFOmapred.JobClient:     Launched map tasks=2
13/06/07 16:52:46 INFOmapred.JobClient:   FileSystemCounters
13/06/07 16:52:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=212
13/06/07 16:52:46 INFOmapred.JobClient:   Map-Reduce Framework
13/06/07 16:52:46 INFOmapred.JobClient:     Map input records=2
13/06/07 16:52:46 INFOmapred.JobClient:     Spilled Records=0
13/06/07 16:52:46 INFO mapred.JobClient:     Map output records=2
13/06/07 16:52:46 INFOmapreduce.ImportJobBase: Transferred 212 bytes in 51.383 seconds (4.1259bytes/sec)
13/06/07 16:52:46 INFOmapreduce.ImportJobBase: Retrieved 2 records.
數據文件默認被導入到當前用戶文件夾下表名對應的文件夾了:


Sqoop默認會同時啓動四個Map任務來加速數據導入,可以通過-m 1命令來強制只啓動一個map任務,這樣就只會在HDFS中生成一個數據文件了。因爲tb1表目前就兩條數據,所以一共產生兩個文件,查看下生成的文件內容:

3.2創建Hive表

首先在hive命令行中創建tb1表。注意hive支持的數據類型有限,並且一定要設置表的分隔符爲逗號,否則Hive默認分隔符爲Ctrl+A。

CREATE TABLE tb1(
  id int,
 ......
) <span style="color:#FF0000;">row format delimited fields terminated by ‘,’</span>;
也可以通過下面的命令讓Sqoop根據MySQL表結構自動創建出Hive表:

$ sqoop create-hive-table --connect jdbc:mysql://ip/database --table tb1 --hive-table tb1 --username user -P

3.3導入Hive

現在將HDFS中的文件導入到Hive,注意Hive從HDFS導入數據後,會將HDFS中的文件/user/root/tb1移動到/user/hive/tb1:

LOADDATA INPATH '/user/root/tb1/part-m-*' OVERWRITE INTO TABLE tb1

3.4一條強大的命令

上面的從MySQL導出數據到HDFS、創建Hive表格、導入數據到Hive三步,可以直接用一條Sqoop命令完成:

$ sqoop import --connect jdbc:mysql://ip/database --table tb1 --username user -P  --hive-import


參考資料:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

http://sqoop.apache.org/docs/1.99.1/Installation.html



轉自:http://blog.csdn.net/dc_726/article/details/9069871


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章