CDH安裝Tez 0.8.5
- 1.1前置環境
1)安裝JDK
2)安裝Maven
下載安裝包:apache-maven-3.5.4-bin.tar.gz
- 解壓:
tar -zxvf apache-maven-3.5.4-bin.tar.gz -C /usr/local/software/maven
- 配置:
[joy@hadoop002 dev_env]$ vim /etc/profile
export MAVEN_HOME=/usr/local/software/maven/apache-maven-3.5.4
export PATH=${MAVEN_HOME}/bin:$PATH
[joy@hadoop002 maven]$ source /etc/profile
[joy@hadoop002 maven]$ mvn -v
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00)
Maven home: /opt/dev_env/maven/maven
Java version: 1.8.0_91, vendor: Oracle Corporation, runtime: /usr/local/jdk1.8.0_91/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"
cd /opt/dev_env/maven
mkdir mavenRepository
[joy@hadoop002 maven]$ vim $MAVEN_HOME/conf/settings.xml
# 新增mavenRepository路徑
<localRepository>/opt/dev_env/maven/mavenRepository/</localRepository>
# 搜索 mirrors 在<mirrors> 新增阿里maven下載路徑 </mirrors>
<!-- 阿里Maven下載路徑 -->
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>*,!cloudera</mirrorOf>
<name>Nexus aliyun</name>
<url> http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror
3)安裝protobuf-2.5.0
protobuf 2.5.0 (必須是這個版本, 官網中有說明)https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
解壓:
tar -zxvf protobuf-2.5.0.tar.gz -C ./
- OS依賴包安裝:
yum -y install gcc gcc-c++ libstdc++-devel make build
- 編譯
[joy@hadoop002 protobuf]$ cd protobuf-2.5.0
[joy@hadoop002 protobuf]$ ln -s protobuf-2.5.0/ protobuf
[joy@hadoop002 protobuf]$ ./configure
# ./configure --prefix=/usr/local/protobuf # 建議這樣安裝
# (完了之後會在 /usr/local/bin 目錄下生成一個可執行文件 protoc)
[joy@hadoop002 protobuf]$ make
[joy@hadoop002 protobuf]$ make install
# 注意: 驗證是否安裝成功,configure後默認位置/usr/local/bin/會出現protoc
[joy@hadoop002 protobuf]$ /usr/local/bin/protoc --version
# /usr/local/protobuf/bin/protoc --version
libprotoc 2.5.0
- 配置環境變量
# protobuf
# 在文件的末尾添加如下的兩行:
export PATH=$PATH:/usr/local/bin/
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
# export PATH=$PATH:/usr/local/protobuf/bin/
# export PKG_CONFIG_PATH=/usr/local/protobuf/lib/pkgconfig/
- 1.2下載並解壓tez
(1)下載地址:http://tez.apache.org/releases/
- 1.3修改
建議在win10下通過編輯工具修改,因爲文件比較大,比較難找到需要修改的地方
(1)修改pom.xml
第一處:修改爲我們cdh所用版本
<hadoop.version>2.6.0-cdh5.16.2</hadoop.version>
第二三處:添加Cloudera的Maven倉庫地址【因爲Hadoop環境版本爲CDH版本】
<repositories>
<repository>
<id>${distMgmtSnapshotsId}</id>
<name>${distMgmtSnapshotsName}</name>
<url>${distMgmtSnapshotsUrl}</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
<name>Cloudera Repositories</name>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>maven2-repository.atlassian</id>
<name>Atlassian Maven Repository</name>
<url>https://maven.atlassian.com/repository/public</url>
<layout>default</layout>
</pluginRepository>
<pluginRepository>
<id>${distMgmtSnapshotsId}</id>
<name>${distMgmtSnapshotsName}</name>
<url>${distMgmtSnapshotsUrl}</url>
<layout>default</layout>
</pluginRepository>
<pluginRepository>
<id>cloudera</id>
<name>Cloudera Repositories</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</pluginRepository>
</pluginRepositories>
第四處:註釋掉tez-ext-service-tests、tez-ui、tez-ui2這三個模塊
# 注意: 這裏稍微不同
tez 0.9.2 註釋掉 tez-ext-service-tests、tez-ui
tez 0.8.5 註釋掉 tez-ext-service-tests、tez-ui、tez-ui2
<modules>
<module>hadoop-shim</module>
<module>tez-api</module>
<module>tez-common</module>
<module>tez-runtime-library</module>
<module>tez-runtime-internals</module>
<module>tez-mapreduce</module>
<module>tez-examples</module>
<module>tez-tests</module>
<module>tez-dag</module>
<!-- <module>tez-ext-service-tests</module> -->
<!-- <module>tez-ui</module> -->
<!-- <module>tez-ui2</module> -->
<module>tez-plugins</module>
<module>tez-tools</module>
<module>hadoop-shim-impls</module>
<module>tez-dist</module>
<module>docs</module>
</modules>
(2)修改java類
# 進入到:
cd apache-tez-0.8.5-src
vim tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/mapreduce/JobContextImpl.java
修改:JobContextImpl.java,在文件最後新增:
/**
* Get the boolean value for the property that specifies which classpath
* takes precedence when tasks are launched. True - user's classes takes
* precedence. False - system's classes takes precedence.
* @return true if user's classes should take precedence
*/
@Override
public boolean userClassesTakesPrecedence() {
return getJobConf().getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);
}
- 1.4編譯
cd /opt/dev_env/apache-tez-0.8.5-src
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
1.5 獲取編譯好的Tez包: tez-0.8.5.tar.gz
[joy@hadoop002 apache-tez-0.8.5-src]$ cd tez-dist/target/
[joy@hadoop002 target]$ ll
total 57616
drwxrwxr-x. 2 joy joy 6 Jun 12 19:34 archive-tmp
drwxrwxr-x. 2 joy joy 28 Jun 12 19:34 maven-archiver
drwxrwxr-x. 3 joy joy 4096 Jun 12 19:34 tez-0.8.5
drwxrwxr-x. 3 joy joy 4096 Jun 12 19:34 tez-0.8.5-minimal
-rw-rw-r--. 1 joy joy 12623396 Jun 12 19:34 tez-0.8.5-minimal.tar.gz
-rw-rw-r--. 1 joy joy 46359820 Jun 12 19:34 tez-0.8.5.tar.gz
-rw-rw-r--. 1 joy joy 2869 Jun 12 19:34 tez-dist-0.8.5-tests.jar
- (1)將tez-0.8.5.tar.gz壓縮包上傳到HDFS上:/user/tez
[joy@hadoop002 target]$ hdfs dfs -mkdir /user/tez
[joy@hadoop002 target]$ hdfs dfs -chmod -R 775 /user/tez/
[joy@hadoop002 target]$ hdfs dfs -put tez-0.8.5.tar.gz /user/tez/
[joy@hadoop002 target]$ hdfs dfs -ls /user/tez
Found 1 items
-rw-r--r-- 3 joy supergroup 46359820 2020-06-15 15:50 /user/tez/tez-0.8.5.tar.gz
(2)創建tez目錄
在/opt/cloudera/parcels/CDH/lib 下創建tez目錄
sudo mkdir /opt/cloudera/parcels/CDH/lib/tez
進入到tez目錄下,創建conf目錄
sudo mkdir /opt/cloudera/parcels/CDH/lib/tez/conf
- 將(1)中的tez-0.9.2-minimal文件夾下的jar及lib下的jar拷貝到tez中
在/opt/cloudera/parcels/CDH/lib/tez 中如下目錄:
### 注意: 這裏是把tez-0.8.5-minimal下的內容拷貝,而不是tez-0.8.5下的內容
# 進入apache-tez-0.8.5-src/tez-dist/target/tez-0.8.5-minimal
[joy@hadoop002 tez-0.8.5-minimal$ cd /opt/dev_env/apache-tez-0.8.5-src/tez-dist/target/tez-0.8.5-minimal
# 將*.jar拷貝至/opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal$ sudo cp ./*.jar /opt/cloudera/parcels/CDH/lib/tez
# 將lib拷貝至/opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal$ sudo cp -r ./lib /opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal$ cd /opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal]$ ll
其中conf文件夾下,放置tez-site.xml
sudo vim tez-site.xml
### 內容如下:
<configuration>
<property>
<name>tez.lib.uris</name>
<!-- 這裏指向hdfs上的tez.tar.gz包 -->
<value>${fs.defaultFS}/user/tez/tez-0.8.5.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>false</value>
<description>使用hadoop自身的lib包,設置爲true的話可以使用minimal的tez包,false的話需要使用tez-0.9.2.tar.gz的包</description>
</property>
<property>
<name>hive.tez.container.size</name>
<value>1024</value>
<description>Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb</description>
</property>
<!--
<property>
<name>tez.container.max.java.heap.fraction</name>
<description>這裏是因爲我機器內存不足, 而添加的參數</description>
<value>0.2</value>
</property>
<property>
<name>tez.am.launch.cluster-default.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
<description>tez任務執行會使用native命令,沒有的話會報錯</description>
<property>
</property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
<description>tez任務執行會使用native命令,沒有的話會報錯</description>
</property>
-->
</configuration>
從 /opt/cloudera/parcels/CDH/jars中拷貝kryo-2.22.jar到tez文件夾下的lib文件夾下
[joy@hadoop002 tez]$ sudo cp /opt/cloudera/parcels/CDH/jars/kryo-2.22.jar /opt/cloudera/parcels/CDH/lib/tez/lib/
防止出現以下異常:
java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer
/opt/cloudera/parcels/CDH/lib/tez/lib/tez/lib中包含slf4j的jar包,會打印較多日誌,可以在客戶端中去掉slf4j-api-1.7.10.jar、slf4j-log4j12-1.7.10.jar這兩個jar包,減少日誌打印
# 直接修改名字就可以
[joy@hadoop002 lib]$ cd /opt/cloudera/parcels/CDH/lib/tez/lib
sudo mv slf4j-api-1.7.10.jar slf4j-api-1.7.10.jar.bak
sudo mv slf4j-log4j12-1.7.10.jar slf4j-log4j12-1.7.10.jar.bak
到這裏就客戶端的基本配置結束。
將 tez這個文件 copy到集羣的其他主機的cloudera目錄下
scp -r /opt/cloudera/parcels/CDH/lib/tez joy@hadoop003:/opt/cloudera/parcels/CDH/lib/
配置hive環境變量
-
在cdh找到hive客戶端配置
- 搜索 客戶端 可以直接找到下面兩個位置
- HiveServer2 環境高級配置代碼段(安全閥)
HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/tez/conf:/opt/cloudera/parcels/CDH/lib/tez/*:/opt/cloudera/parcels/CDH/lib/tez/lib/*
- hive-env.sh 的 Gateway 客戶端環境高級配置代碼段(安全閥)
然後重啓hive,這樣配置的環境變量纔會生效。
重新部署客戶端配置,tez安裝完成。
測試:
- 1. hive Cli
2. beeline
-
安裝過程中出現的錯誤
- java.lang.ArithmeticException: / by zero
解決:
# 直接在tez-site.xml中進行配置(上面配置已經補充,根據集羣設置而定)
set hive.tez.container.size = 1024;
Status: Running (Executing on YARN cluster with App id application_1592204457838_0003)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED -1 0 0 -1 0 0
Reducer 2 KILLED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.42 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1592204457838_0003_1_00, diagnostics=[Vertex vertex_1592204457838_0003_1_00 [Map 1]
killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ce_user initializer failed, vertex=vertex_1592204457838_0003_1_00 [Map 1],
java.lang.ArithmeticException: / by zero
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:123)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1592204457838_0003_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1592204457838_0003_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
- hive on tez 出現bug參考 https://www.jianshu.com/p/32faae7230d5