1. Spark install 01
1.1下載spark-2.4.4-bin-hadoop2.7.tgz
1.2.解壓 tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz
1.3. mv /download/spark-2.4.4-bin-hadoop2.7 /soft
1.4. ln -s spark-2.4.4-bin-hadoop2.7 spark
1.5. 環境變量
[/etc/profile]
SPARK_HOME=/soft/spark
PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
1.6.刷新變量
$>source /etc/profile
1.7. 驗證spark
$>cd /soft/spark
$>./spark-shell
1.8.webui
http://IP :4040/
2. Spark Standalone Install
2.1 主機s101爲master, S102, S103, S104爲slave.
2.2 如以上的1.1--1.6在各主機上安裝spark.
2.3 在/soft/spark/conf上執行如下設定:
ln -s /soft/hadoop/etc/hadoop/core-site.xml core-site.xml
ln -s /soft/hadoop/etc/hadoop/hdfs-site.xml hdfs-site.xml
2.4 在/soft/spark/conf中修改slaves:
s102
s103
s104
2.5在/soft/spark/conf中修改spark-en.sh
export JAVA_HOME=/soft/jdk
export SPARK_MASTER_IP=S101
export SPARK_MASTER_PORT=7077
2.6把slaves和spark-en.sh分發到其它主機
2.7 啓動集羣
>/soft/hadoop/sbin/start-dfs.sh
>/soft/spark/sbin/start-all.sh
2.8查看webui
http://s101:8080
3. Spark HA(zookeeper)安裝
3.1 s101爲master, s102,s103,s104爲worker, s105爲standby
3.2 s101,s102,s103爲zookeeper
3.3 針對以上2.5, 把spark-en.sh中的SPARK_MASTER_IP刪除,並添加如下:
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=s101,s102,s103 -Dspark.deploy.zookeeper.dir=/spark"
3.4 把更新後的spark-en.sh分發給各主機, 且把slaves分發給s105
3.5 在s101的/soft/spark/sbin上執行./start-all.sh
3.6在s105的/soft/spark/sbin上執行./start-master.sh
3.7在s101和s105上連接web,如:http://s101:8080, http://s105:8080