問題1 :安裝spark-yarn前確定yarn能否正常調度,以下爲測試用例
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
其中 /input目錄先創建完並在其內加入wordcount的文件,如
touch 1.txt
echo >>1.txt a a bb b b cc c
將/input文件上傳到hdfs的根目錄:
hadoop fs -put input /input
/output不用創建否則會報錯
問題2:配置history需確認yarn的歷史日誌服務器和當前日誌服務器的IP,否則會影響Yarn的運行
spark-default.xml
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/spark-job-log
spark.yarn.historyServer.address=slave1:18080
spark.history.ui.port=18080
spark.env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://master:9000/spark-job-log"
YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
測試用例:PI
# client
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.11-2.1.1.jar 100
# cluster
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
./examples/jars/spark-examples_2.11-2.1.1.jar 100
都能正確運行則ok,cluster的結果在yarn的application的日誌跳轉到slave的18080,點擊exector後再driver的stdout下
如點擊stdout跳轉的頁面不自動顯示,需在yarn-site.xml中添加
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
注意:我的yarn歷史服務日誌啓動在master