1. 準備必要的開發工具和環境:
- 安裝 jdk 1.8:參考鏈接:https://blog.csdn.net/smile_from_2015/article/details/80056297
- scala 2.11.8
下載地址:https://www.scala-lang.org/download/2.11.8.html 我下載的文件名是 scala-2.11.8.tgz
## scala:Spark由Scala語言寫成,本地編譯需要用到scala
## 解壓
sudo tar zxvf scala-2.11.8.tgz -C /usr/lib
## 移動文件夾
sudo mv /usr/lib/scala-2.11.8 /usr/lib/scala
## 配置環境變量
sudo vim /etc/profile
## 在文件末尾添加
export SCALA_HOME=/usr/lib/scala
export PATH=${SCALA_HOME}/bin:$PATH
## 執行命令使修改立即生效
source /etc/profile
## 驗證是否安裝成功
zmx@ubuntu:~/nju$ scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
- maven 3.3.9
下載地址:https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.3.9/binaries/ 我下載的文件名爲 apache-maven-3.3.9-bin.tar.gz
## 解壓
sudo tar zxvf apache-maven-3.3.9-bin.tar.gz
## 移動文件夾
sudo mv apache-maven-3.3.9 maven
sudo mv maven/ /usr/lib/maven
## 添加環境變量
gedit ~/.bashrc
## 添加如下內容
export MAVEN_HOME=/usr/lib/maven
export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
export PATH=$PATH:$MAVEN_HOME/bin
## 使路徑生效
source ~/.bashrc
## 查看是否安裝成功
zmx@ubuntu:~/nju$ mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/lib/maven
Java version: 1.8.0_201, vendor: Oracle Corporation
Java home: /usr/lib/jdk/jdk1.8.0_201/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-46-generic", arch: "amd64", family: "unix"
## 修改settings.xmml以達到下載jar加速的效果
sudo gedit /usr/lib/maven/conf/settings.xml
<!-- 阿里雲中央倉庫 -->
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
- 安裝IntelliJ IDEA(scala plugin)
官網下載安裝包:http://www.jetbrains.com/education/download/#section=idea-Scala
文件名爲:ideaIE-2018.3.1.tar.gz
## 解壓
tar xzvf ideaIE-2018.3.1.tar.gz
## 啓動
cd ideaIE-2018.3.1/bin
./idea.sh
安裝 Scala plugins
將maven設置爲本地下載的maven並且更改User settings file
- (如果使用maven進行build可以不用安裝sbt!!!!理論上本次環境搭建無需安裝sbt,但博主安裝了sbt,並未出現問題)安裝sbt 0.13.x:scala工程構建工具,參考鏈接 ubuntu16.04安裝sbt 注意sbt的版本不要下錯了!下載0.13.x版本
2. Spark 2.1.0 源碼下載
下載Spark源碼,地址:https://archive.apache.org/dist/spark/spark-2.1.0/
## 解壓
tar zxvf spark-2.1.0.tgz
3. 編譯Spark項目
./build/mvn -DskipTests clean package
出現錯誤,由於出現錯誤時沒有及時記錄,錯誤詳情參見
https://stackoverflow.com/questions/28004552/problems-while-compiling-spark-with-maven
使用命令 ./build/mvn -e查看錯誤詳情
[ERROR] No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
org.apache.maven.lifecycle.NoGoalSpecifiedException: No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy.
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:97)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:954)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
[ERROR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoGoalSpecifiedException
原因:https://stackoverflow.com/questions/28004552/problems-while-compiling-spark-with-maven
第一次安裝jdk時安裝了openjdk8而不是jdk1.8,使用命令 sudo apt-get remove openjdk* 刪除openjdk8之後,jdk1.8安裝教程見鏈接:ubuntu16.04搭建jdk1.8運行環境
問題仍然存在:參考鏈接:https://github.com/davidB/scala-maven-plugin/issues/185
https://blog.csdn.net/qq_21355765/article/details/81743815
總結出問題在於,出現錯誤後,反覆執行build命令,大概在第2-3次編譯成功,編譯時間視網絡而定,編譯成功後如下圖所示:
編譯完成後測試一下
zmx@ubuntu:~/nju/spark-2.1.0$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/03/11 23:26:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/11 23:26:07 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.127.163 instead (on interface ens33)
19/03/11 23:26:07 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://192.168.127.163:4040
Spark context available as 'sc' (master = local[*], app id = local-1552371968105).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
4. 將編譯完成的spark源碼導入IDEA
一路點擊next
可能碰到的問題:用戶權限不夠,文件權限都是read-only,進入spark根目錄,使用命令 sudo chmod -R 777 ./ 改變權限
6. 導入成功後,運行並調試一下spark在examples目錄下的實例,以LogQuery爲例
- 配置運行參數,VM options: -Dspark.master=local,代表使用本地模式運行Spark代碼
運行結果如下圖所示:出現錯誤,因爲找不到flume依賴的部分源碼
解決方案如下:參考鏈接:搭建Spark源碼研讀和代碼調試的開發環境
File -> ProjectStructure -> Module -> spark-streaming-flume-sink_2.11 ->Sources
將taget目錄和子目錄sink均加入Sources(對比下圖右側藍色部分Source Folders)
- 添加運行依賴的jars
再次運行,此次會花費較長時間,因爲需要成功編譯LogQuery,但仍然出現如下錯誤:SLF4J:Failed to load class "org.slf4j.impl.StaticLoggerBinder"
出錯的原因在於沒有更改IDEA maven的配置,按照步驟1中的配置將maven設置爲本地下載的maven並且更改User settings file後重新執行LogQuery 參考鏈接 SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”. in a Maven Project [duplicate]
- 出現錯誤 java.lang.NoClassDefFoundError: scala/collection/immutable/List 和 java.lang.NoClassFoundException scala.collection.immutable.List
出錯原因在於運行Spark App一般都是通過spark-submit
命令,把你的jar運行到已經安裝的Spark環境裏,環境中包含所有Spark的依賴,而IDE環境中缺少這些依賴
解決方法:File -> ProjectStructure -> Modules -> spark-examples_2.11 -> Dep
需要注意的是:
1. jars/*.jar是在build Spark時下載的,如果目錄爲空或者修改了源代碼想要更新jars,可以重新編譯Spark
2. 上圖中可以看到基本上所有依賴jars都是provided,意爲默認提供,因爲默認採用spark-submit方式運行Spark App
- 再次運行LogQuery查看輸出
- 單步調試源代碼