使用spark-submit 提交第一個 spark 應用到集羣中 & 使用 spark-shell 接口運行spark 程序

原創

2019-03-13 18:13

1 提交第一個spark 應用到集羣中運行

語法：

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

實際使用示例：

./spark-submit --class org.apache.spark.examples.SparkPi 
				--master spark://hadoop1:7077 
				--total-executor-cores 5 
				--executor-cores 1 
				--executor-memory 200mb 
				/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.8.0-hadoop2.6.0-cdh5.8.0.jar  100

參數介紹：
–class: 業務運行代碼class
–master: 提交到具體的master 地址可以是 spark 的一個節點，可以是yarn
–total-executor-cores: 總核數
–executor-cores: 每個executor的核心數
–executor-memory: 每個executor使用的內存數
xxx.jar 爲實際提交的jar 包
100 是業務運行代碼class需要傳入的參數

注意：jar 包的位置必須是位於Spark 的節點機器，或者是hdfs 文件目錄上，必須對於Spark 來說是可見的。不然會報 ClassNotFoundException

有朋友使用Java代碼提交任務時出現過這種問題，看圖：

錯誤信息：

原因就是他在開發機上提交本地的包到spark 機器。但是spark 根本就拿不到這個包，才導致報錯。把jar 包放到 hdfs 上去，解決問題。

關於參數的說明，官方文檔說的非常明確。

更多詳細查看官方文檔：Launching Applications with spark-submit

2 使用spark-shell 接口運行spark 程序: 連接到yarn 上

[root@hadoop1 spark]# pwd
/opt/cloudera/parcels/CDH/lib/spark
[root@hadoop1 spark]# spark2-shell --master yarn
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.80.131:4040
Spark context available as 'sc' (master = yarn, app id = application_1548574542102_0002).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0.cloudera4
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

執行一個 wordcount

scala> sc.textFile("hdfs://hadoop1:8020/user/admin/spark-test").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).collect   
res0: Array[(String, Int)] = Array((hello,9), (zhouq,3), (wocao,2), (memeda,2), (hxt,1), (heyxyw,1))

在Spark webui 中查看: 會在 spark 集羣中產生一個app id : application_1548574542102_0002

注意：此 UI 界面是使用 CDH5 搭建的

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用spark-submit 提交第一個 spark 應用到集羣中 & 使用 spark-shell 接口運行spark 程序

1 提交第一個spark 應用到集羣中運行

2 使用spark-shell 接口運行spark 程序: 連接到yarn 上

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

go語言 defer延遲機制

面試必備的分佈式事務方案

A null value cannot be assigned to a primitive type

SparkSQL 實現UV & PV計算

使用spark-submit 提交第一個 spark 應用到集羣中 & 使用 spark-shell 接口運行spark 程序

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

使用spark-submit 提交第一個 spark 應用到集羣中 & 使用 spark-shell 接口 運行spark 程序

1 提交第一個spark 應用到集羣中運行

2 使用spark-shell 接口運行spark 程序: 連接到yarn 上

使用spark-submit 提交第一個 spark 應用到集羣中 & 使用 spark-shell 接口運行spark 程序