Ubuntu 16.04搭建Spark源碼研讀和代碼調試的開放環境

1. 準備必要的開發工具和環境:

   下載地址:https://www.scala-lang.org/download/2.11.8.html  我下載的文件名是 scala-2.11.8.tgz

## scala:Spark由Scala語言寫成,本地編譯需要用到scala
## 解壓
sudo tar zxvf scala-2.11.8.tgz -C /usr/lib
## 移動文件夾
sudo mv /usr/lib/scala-2.11.8 /usr/lib/scala
## 配置環境變量
sudo vim /etc/profile
## 在文件末尾添加
export SCALA_HOME=/usr/lib/scala
export PATH=${SCALA_HOME}/bin:$PATH
## 執行命令使修改立即生效
source /etc/profile
## 驗證是否安裝成功
zmx@ubuntu:~/nju$ scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

 

  • maven 3.3.9

    下載地址:https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.3.9/binaries/ 我下載的文件名爲 apache-maven-3.3.9-bin.tar.gz

## 解壓
sudo tar zxvf apache-maven-3.3.9-bin.tar.gz 
## 移動文件夾
sudo mv apache-maven-3.3.9 maven
sudo mv maven/ /usr/lib/maven
## 添加環境變量
gedit ~/.bashrc
## 添加如下內容
export MAVEN_HOME=/usr/lib/maven
export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
export PATH=$PATH:$MAVEN_HOME/bin
## 使路徑生效
source ~/.bashrc
## 查看是否安裝成功
zmx@ubuntu:~/nju$ mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/lib/maven
Java version: 1.8.0_201, vendor: Oracle Corporation
Java home: /usr/lib/jdk/jdk1.8.0_201/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-46-generic", arch: "amd64", family: "unix"
## 修改settings.xmml以達到下載jar加速的效果
sudo gedit /usr/lib/maven/conf/settings.xml 

<!-- 阿里雲中央倉庫 -->
     <mirror>
      <id>alimaven</id>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
  • 安裝IntelliJ IDEA(scala plugin)

  官網下載安裝包:http://www.jetbrains.com/education/download/#section=idea-Scala 
  文件名爲:ideaIE-2018.3.1.tar.gz

## 解壓
tar xzvf ideaIE-2018.3.1.tar.gz
## 啓動
cd ideaIE-2018.3.1/bin
./idea.sh

   安裝 Scala plugins

    

    將maven設置爲本地下載的maven並且更改User settings file

     

  •     (如果使用maven進行build可以不用安裝sbt!!!!理論上本次環境搭建無需安裝sbt,但博主安裝了sbt,並未出現問題)安裝sbt 0.13.x:scala工程構建工具,參考鏈接 ubuntu16.04安裝sbt  注意sbt的版本不要下錯了!下載0.13.x版本

 

2. Spark 2.1.0 源碼下載

  下載Spark源碼,地址:https://archive.apache.org/dist/spark/spark-2.1.0/

## 解壓
tar zxvf spark-2.1.0.tgz

 

3. 編譯Spark項目

./build/mvn -DskipTests clean package

  出現錯誤,由於出現錯誤時沒有及時記錄,錯誤詳情參見

     https://stackoverflow.com/questions/28004552/problems-while-compiling-spark-with-maven

  使用命令 ./build/mvn -e查看錯誤詳情

[ERROR] No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
org.apache.maven.lifecycle.NoGoalSpecifiedException: No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy.
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:97)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:954)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
[ERROR] 
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoGoalSpecifiedException

 原因:https://stackoverflow.com/questions/28004552/problems-while-compiling-spark-with-maven

 第一次安裝jdk時安裝了openjdk8而不是jdk1.8,使用命令 sudo apt-get remove openjdk* 刪除openjdk8之後,jdk1.8安裝教程見鏈接:ubuntu16.04搭建jdk1.8運行環境

問題仍然存在:參考鏈接:https://github.com/davidB/scala-maven-plugin/issues/185 

                                          https://blog.csdn.net/qq_21355765/article/details/81743815 

總結出問題在於,出現錯誤後,反覆執行build命令,大概在第2-3次編譯成功,編譯時間視網絡而定,編譯成功後如下圖所示:

編譯完成後測試一下

zmx@ubuntu:~/nju/spark-2.1.0$ ./bin/spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/03/11 23:26:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/11 23:26:07 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.127.163 instead (on interface ens33)
19/03/11 23:26:07 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://192.168.127.163:4040
Spark context available as 'sc' (master = local[*], app id = local-1552371968105).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

4. 將編譯完成的spark源碼導入IDEA

一路點擊next

可能碰到的問題:用戶權限不夠,文件權限都是read-only,進入spark根目錄,使用命令 sudo chmod -R 777 ./ 改變權限

                                             

6. 導入成功後,運行並調試一下spark在examples目錄下的實例,以LogQuery爲例

                                               

  •   配置運行參數,VM options: -Dspark.master=local,代表使用本地模式運行Spark代碼

運行結果如下圖所示:出現錯誤,因爲找不到flume依賴的部分源碼

解決方案如下:參考鏈接:搭建Spark源碼研讀和代碼調試的開發環境 

File -> ProjectStructure -> Module -> spark-streaming-flume-sink_2.11 ->Sources

將taget目錄和子目錄sink均加入Sources(對比下圖右側藍色部分Source Folders)

  • 添加運行依賴的jars

    再次運行,此次會花費較長時間,因爲需要成功編譯LogQuery,但仍然出現如下錯誤:SLF4J:Failed to load class "org.slf4j.impl.StaticLoggerBinder"

出錯的原因在於沒有更改IDEA maven的配置,按照步驟1中的配置將maven設置爲本地下載的maven並且更改User settings file後重新執行LogQuery 參考鏈接 SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”. in a Maven Project [duplicate]

  • 出現錯誤 java.lang.NoClassDefFoundError: scala/collection/immutable/List 和 java.lang.NoClassFoundException scala.collection.immutable.List

出錯原因在於運行Spark App一般都是通過spark-submit命令,把你的jar運行到已經安裝的Spark環境裏,環境中包含所有Spark的依賴,而IDE環境中缺少這些依賴

   解決方法:File -> ProjectStructure -> Modules -> spark-examples_2.11 -> Dep

需要注意的是:

    1. jars/*.jar是在build Spark時下載的,如果目錄爲空或者修改了源代碼想要更新jars,可以重新編譯Spark

    2. 上圖中可以看到基本上所有依賴jars都是provided,意爲默認提供,因爲默認採用spark-submit方式運行Spark App

  • 再次運行LogQuery查看輸出

  • 單步調試源代碼

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章