一、安裝mysql
yum install wget
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install mysql-server
啓動mysql
service mysqld start
開機啓動
systemctl enable mysqld.service
設置密碼
#/usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
二、安裝配置hive
官網下載hive 和 mysql-connector-java-5.*.*-bin.jar
上傳後解壓
tar -zxvf apache-hive-2.3.6-bin.tar.gz -C ../app/
Hive環境變量設置
vi ~/.bashrc
# Hive environment (#代表註釋)
export HIVE_HOME=/home/hadoop/app/apache-hive-2.3.6-bin
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH
激活環境變量
source ~/.bashrc 修改配置文件
cd ../app/
創建hive-site.xml文件 在hive/conf/目錄下創建hive-site.xml文件
vi hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
</description>
</property>
</configuration>
將mysql-connector-Java-5.1.15-bin.jar拷貝到/opt/software/hive/apache-hive-2.1.1-bin下的lib下即可
三、源數據初始化
[hadoop@sparkServer apache-hive-2.3.6-bin]$ bin/schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/app/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
四、測試
(1)創建數據庫create database db_hive_test;
(2)創建測試表
use db_hive_test;
create table student(id int,name string) row format delimited fields terminated by '\t';
(3)返回linux新建student.txt 文件寫入數據(id,name 按tab鍵分隔)
1001 zhangsan
1002 lisi
(4)在hive中導入數據
load data local inpath '/home/hadoop/student.txt' into table db_hive_test.student;
(5)查看結果
select * from db_hive_test.student;
五、Spark 連接hive 元數據庫(mysql)
1)拷貝hive的hive-site.xml文件到spark的conf目錄下
2)修改spark中hive-site.xml文件
添加以下:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
3)另建窗口啓動:
[root@head42 conf]$ hive --service metastore
4)啓動pyspark:
[root@head42 conf]$ pyspark
5)測試:
>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> read_hive_score = sqlContext.sql("select * from db_hive_test.student")
>>> read_hive_score.show()
+----+--------+
| id| name|
+----+--------+
|1001|zhangsan|
|1002| lisi|
+----+--------+
這樣就OK了!
參考: