hive on spark的安裝實現


    Hive on Spark安裝,hive是基於Hadoop的數據倉庫,hdfs爲hive存儲空間,mapreduce爲hive的sql計算引擎。但是由於mapreduce很多計算過程都要經過硬盤讀寫等劣勢,和spark等計算引擎相比,無論是計算速度,還是計算靈活度上都有很多劣勢,這也導致了hive on mapreduce計算速度並不是令人很滿意。本篇來講下hive on spark,將hive的計算引擎替換爲spark,速度將有很大的提升.

一、環境準備

centos6.5
hadoop2.6集羣,需要hdfs、yarn
hive2.0.0
spark1.5源碼
maven3.5(自行安裝)
jdk1.8(自行安裝)
scala2.10(自行安裝)


二、maven編譯spark,在官網下載spark1.5源碼,在源碼根目錄下運行

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"

生成spark-1.5.0-bin-hadoop2-without-hive.tgz

三、安裝hadoop2.6集羣
1、免密登陸並修改主機名
ssh-keygen -t rsa
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop

2、解壓
3、配置環境變量
export JAVA_HOME=/usr/local/jdk1.8.0_121/
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.0-cdh5.5.2
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH

source  .bash_profile 

4、修改core-site.xml
 vi core-site.xml 

<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://weisc:9000</value>
   <description>NameNode URI.</description>
 </property>

 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
   <description>Size of read/write buffer used inSequenceFiles.</description>
 </property>
</configuration>

5、
編輯hdfs-site.xml
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/name
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/data
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/namesecondary

[hadoop@h201 hadoop]$ vi hdfs-site.xml

 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>weisc:50090</value>
   <description>The secondary namenode http server address andport.</description>
 </property>

 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/name</value>
   <description>Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.</description>
 </property>

 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/data</value>
   <description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
 </property>

 <property>
   <name>dfs.namenode.checkpoint.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/namesecondary</value>
   <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
 </property>

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>


6、
編輯mapred-site.xml

[hadoop@h201 hadoop]$ cp mapred-site.xml.template mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
  </property>

  <property>
   <name>mapreduce.jobhistory.address</name>
    <value>weisc:10020</value>
    <description>MapReduce JobHistoryServer IPC host:port</description>
  </property>

  <property>
   <name>mapreduce.jobhistory.webapp.address</name>
    <value>weisc:19888</value>
    <description>MapReduce JobHistoryServer Web UI host:port</description>
  </property>

*****
屬性”mapreduce.framework.name“表示執行mapreduce任務所使用的運行框架,默認爲local,需要將其改爲”yarn”
*****
7、
 編輯yarn-site.xml
[hadoop@h201 hadoop]$ vi yarn-site.xml

<property>
   <name>yarn.resourcemanager.hostname</name>
  <value>weisc</value>
  <description>The hostname of theRM.</description>
</property>

 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
   <description>Shuffle service that needs to be set for Map Reduceapplications.</description>
 </property>

8、
[hadoop@h201 hadoop]$ vi hadoop-env.sh 
export JAVA_HOME=/usr/jdk1.7.0_25
9、
[hadoop@h201 hadoop]$ vi slaves 
h202
h203
10、格式化
bin/hadoop namenode -format


四、安裝mysql-server
yum -y install mysql-server
mysql
create user 'hive' identified by 'hive';
必須設置遠程可登陸
grant all privileges on *.* to hive@'%' identified by 'hive' with grant option;
flush privileges;
create database hive;

五、安裝hive2.0

1、修改conf/hive-site.xml

<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
   <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

<property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive/local</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
2、初始化數據庫bin/schematool -initSchema -dbType mysql



六、安裝scala2.10.1

tar -zxvf scala-2.10.1.tgz
vi .bash_profile 修改環境變量
export SCALA_HOME=/apps/scala-2.10.1
PATH=$HADOOP_HOME/bin:$PATH:$SCALA_HOME/bin

查看scala -version


七、安裝spark後 bin/run-example org.apache.spark.examples.SparkPi
 
 
 修改hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<property>
<name>spark.home</name>
<value>/apps/spark</value>
</property>
<!--sparkcontext -->
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.executor.memeory</name>
<value>2g</value>
</property>
<property>
<name>spark.driver.memeory</name>
<value>1g</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>2</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>4</value>
</property>
<property>
<name>spark.app.name</name>
<value>myInceptor</value>
</property>
    <!--事務相關 -->
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>spark.executor.extraJavaOptions</name>
<value>-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>

八、運行hive
select count(*) from test; 



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章