Sqoop 1.4.7 數據導入導出 HDFS MySQL

 


運行環境:
CentOS 7.6
Hadoop 2.7.7
Hive 1.2.2
sqoop 1.4.7
MySQL 5.7.28


註釋:由於sqoop 1.4.6 是基於hbase 早期版本和centos6編譯開發,可能存在不兼容性。

安裝步驟:

前置條件:需要啓動hadoop的DFS和yarn服務


下載MySQL官方自帶的示例數據包:
https://downloads.mysql.com/docs/world.sql.zip

導入到MySQL數據庫:


--sqoop 版本的查詢:
$ sqoop-version 
Warning: /opt/bigdata/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/07 03:28:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017


-- 測試sqoop數據局的連接:查看當前的數據庫中有多少的庫
[hadoop@hadoop102 lib]$ sqoop list-databases --connect jdbc:mysql://192.168.8.102:3306/ --username root --password oracle
Warning: /opt/bigdata/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/07 03:17:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/11/07 03:17:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/07 03:17:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Thu Nov 07 03:17:15 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
employees
metastore
mysql
performance_schema
sys
world
world_x


--查看當前的庫中有多少的表:

[hadoop@hadoop102 lib]$ sqoop list-tables --connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false --username root --password oracle 
        
Warning: /opt/bigdata/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/07 03:25:39 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/11/07 03:25:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/07 03:25:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
city
country
countrylanguage


MySQL 向Hive導入數據:
報錯信息:
19/11/07 03:58:51 INFO hive.HiveImport: Loading uploaded data into Hive
19/11/07 03:58:51 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
19/11/07 03:58:51 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf


-- 全庫導入:
$ cp /opt/bigdata/hive/lib/hive-exec-1.2.2.jar  /opt/bigdata/sqoop/lib/   
$  cp /opt/bigdata/hive/lib/mysql-connector-java-5.1.48-bin.jar  /opt/bigdata/sqoop/lib/  


報錯信息:
19/11/07 04:02:34 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop102:9000/user/hadoop/city already exists

解決辦法:
$ hadoop dfs -rm -r city
建議使用:
$ hdfs dfs -rm -r city


19/11/07 04:16:23 INFO hive.HiveImport: FAILED: SemanticException [Error 10072]: Database does not exist: world
19/11/07 04:16:24 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 88

解決辦法:
$ cp /opt/bigdata/hive/conf/hive-site.xml /opt/bigdata/sqoop/conf/

-- 全庫導入:(待解決)


實際案例:
--MySQL 中的數據導出到HDFS:
sqoop import \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--table city \
--target-dir /cityall \
--delete-target-dir \
--fields-terminated-by "\t" \
--num-mappers 4 \
--split-by id


sqoop import \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--table city \
--columns id,name,countrycode \
--target-dir /citypart \
--delete-target-dir \
--fields-terminated-by "\t" \
--num-mappers 1 \
--split-by id


sqoop import \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--table city \
--where 'id >= 10 and id <= 20' \
--target-dir /city_10_20 \
--delete-target-dir \
--fields-terminated-by "\t" \
--num-mappers 1 \
--split-by id

sqoop import \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--query "select id,name,population from city where \$CONDITIONS and id <= 2000 and countrycode='CHN' order by population desc " \
--target-dir /city_china \
--delete-target-dir \
--fields-terminated-by "\t" \
--num-mappers 2 \
--split-by id

---MySQL的數據導入到Hive:
1.全庫導入:
sqoop import-all-tables \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--hive-import \
--hive-database world \
--hive-overwrite \
--m 2 \
--create-hive-table 

 各種報錯:
 
 sqoop import \
--connect jdbc:mysql://192.168.8.102:3306/world?useSSL=false \
--username root \
--password oracle \
--table city \
--where "id <25" \
--hive-import \
--hive-overwrite \
--hive-table hive_city 


登陸查詢:
beeline> !connect jdbc:hive2://192.168.8.102:10000 hadoop hadoop
Connecting to jdbc:hive2://192.168.8.102:10000
Connected to: Apache Hive (version 1.2.2)
Driver: Hive JDBC (version 1.2.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.8.102:10000> show databases;


0: jdbc:hive2://192.168.8.102:10000> select * from default.hive_city;
+---------------+-------------------+------------------------+---------------------+-----------------------+--+
| hive_city.id  |  hive_city.name   | hive_city.countrycode  | hive_city.district  | hive_city.population  |
+---------------+-------------------+------------------------+---------------------+-----------------------+--+
| 1             | Kabul             | AFG                    | Kabol               | 1780000               |
| 2             | Qandahar          | AFG                    | Qandahar            | 237500                |
| 3             | Herat             | AFG                    | Herat               | 186800                |
| 4             | Mazar-e-Sharif    | AFG                    | Balkh               | 127800                |
| 5             | Amsterdam         | NLD                    | Noord-Holland       | 731200                |
| 6             | Rotterdam         | NLD                    | Zuid-Holland        | 593321                |
| 7             | Haag              | NLD                    | Zuid-Holland        | 440900                |
| 8             | Utrecht           | NLD                    | Utrecht             | 234323                |
| 9             | Eindhoven         | NLD                    | Noord-Brabant       | 201843                |
| 10            | Tilburg           | NLD                    | Noord-Brabant       | 193238                |
| 11            | Groningen         | NLD                    | Groningen           | 172701                |
| 12            | Breda             | NLD                    | Noord-Brabant       | 160398                |
| 13            | Apeldoorn         | NLD                    | Gelderland          | 153491                |
| 14            | Nijmegen          | NLD                    | Gelderland          | 152463                |
| 15            | Enschede          | NLD                    | Overijssel          | 149544                |
| 16            | Haarlem           | NLD                    | Noord-Holland       | 148772                |
| 17            | Almere            | NLD                    | Flevoland           | 142465                |
| 18            | Arnhem            | NLD                    | Gelderland          | 138020                |
| 19            | Zaanstad          | NLD                    | Noord-Holland       | 135621                |
| 20            | ´s-Hertogenbosch  | NLD                    | Noord-Brabant       | 129170                |
| 21            | Amersfoort        | NLD                    | Utrecht             | 126270                |
| 22            | Maastricht        | NLD                    | Limburg             | 122087                |
| 23            | Dordrecht         | NLD                    | Zuid-Holland        | 119811                |
| 24            | Leiden            | NLD                    | Zuid-Holland        | 117196                |
+---------------+-------------------+------------------------+---------------------+-----------------------+--+
24 rows selected (0.138 seconds)


從HDFS中導入到MySQL:



sqoop export \
--connect jdbc:mysql://192.168.8.102:3306/myworld?useSSL=false \
--username root \
--password oracle \
--table city \
--export-dir /allcity \
--num-mappers 1 \
--input-fields-terminated-by "\t"
 
 驗證:
 $ mysql -poracle -S /home/hadoop/mysql.sock -e "select count(1) from myworld.city;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------+
| count(1) |
+----------+
|     4079 |
+----------+

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章