scala 2.11.12 下載:https://www.scala-lang.org/download/
scala 2.11.12(Linux):scala-2.11.12.tgz
scala 2.11.12(windows):scala-2.11.12.zip
IDEA 新建一個 Maven項目
Maven項目創建成功提示
[INFO] BUILD SUCCESS
pom.xml 參考:
https://blog.csdn.net/qq262593421/article/details/105769886
創建Object對象
package com.xtd.spark
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
class Example {
def sparkSQL(path:String): Unit = {
// D:/Hadoop/Spark/spark-2.4.0-bin-without-hadoop/examples/src/main/resources/employees.json
val sparkConf = new SparkConf()
sparkConf.setAppName("SparkExample").setMaster("local[2]")
val context = new SparkContext(sparkConf)
val sqlContext = new SQLContext(context)
val people = sqlContext.read.format("json").load(path)
people.printSchema()
people.show()
context.stop()
}
}
object Example{
def main(args: Array[String]): Unit = {
val path = args(0)
val example = new Example
example.sparkSQL(path)
println("path: "+path)
}
}
點擊右上角的對象名稱,編輯項目配置,添加傳遞參數(本地文件添加前綴file:///)
file:///D:/Hadoop/Spark/spark-2.4.0-bin-without-hadoop/examples/src/main/resources/employees.json
employees.json 文件 ,這個文件在spark安裝文件根目錄下的examples下可找到
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
運行會出現以下成功提示
如何打包到集羣?
右擊項目,選擇 Open in Terminal,進入CMD控制檯
輸入Maven編譯命令
mvn clean package -DskipTest
接下來把 jar 包上傳至 Linux 服務器上,通過 spark-submit 提交 jar 到集羣
客戶端模式
spark-submit \
--class com.xtd.spark.Example \
--deploy-mode client \
/home/spark/jar/spark2-1.0.jar \
file:///home/spark/examples/employees.json
spark on yarn
spark-submit \
--class com.xtd.spark.ExampleHDFS \
--master yarn \
--deploy-mode cluster \
--driver-memory 2g \
--executor-cores 1 \
--executor-memory 1g \
/home/spark/jar/spark-1.0.jar \
/user/spark/examples/resources/employees.json
注意事項
/home/spark/jar/spark-1.0.jar 是jar包在Linux上的路徑,jar包上傳在哪就寫哪
file:///home/spark/examples/employees.json 這行是傳遞的參數,file://表示employees.json文件在Linux上
更多參數設置可以輸入命令 spark-submit --help
運行結果