目錄
1.在IDEA中新建Project-->Maven-->Next
2.GroupId一般寫公司統一名稱,ArtifactId寫項目名稱 -->Next
5.解壓apache-maven-3.3.9-bin.zip
6.打開conf中的settings.xml,修改本地倉庫路徑
7.在IDEA中打開File-->settings-->搜maven-->配置解壓的目錄和修改的settings文件-->OK-->右下角彈窗選Enable Auto-Import
1.在IDEA中新建Project-->Maven-->Next
2.GroupId一般寫公司統一名稱,ArtifactId寫項目名稱 -->Next
3.點擊Finish
4.目錄結構
.idea 是元信息,是工作目錄所在位置,如果想拷貝工程到自己的電腦需要把這個目錄刪了重新加載
src 是編輯代碼的目錄
pom.xml 是它的依賴,寫所用的jar包
5.解壓apache-maven-3.3.9-bin.zip
6.打開conf中的settings.xml,修改本地倉庫路徑
拷貝第53行並指定本地倉庫路徑,我指定的是<localRepository>D:\maven\repository</localRepository>,保存。
7.在IDEA中打開File-->settings-->搜maven-->配置解壓的目錄和修改的settings文件-->OK-->右下角彈窗選Enable Auto-Import
8.maven依賴查詢 與添加
9.配置maven的環境變量
10.打開cmd --> mvn -v
11.配置Scala
main-->新建scala文件夾-->File-->Project Structure
-->Modules-->點擊main目錄scala-->點擊Sources
-->Modules-->點擊test目錄scala-->點擊Tests
-->Libraries-->點擊+-->Scala SDK-->OK
12.配置pom文件
再後面追加以下配置
<properties>
<spark.version>2.2.0</spark.version>
<scala.version>2.11</scala.version>
<hadoop.version>2.7.3</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.39</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
</build>
13.測試環境是否成功
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable
object RDDQueueStream {
def main(args: Array[String]): Unit = {
System.setProperty("hadoop.home.dir", "D:\\temp\\hadoop-2.4.1\\hadoop-2.4.1")
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val conf = new SparkConf().setAppName("MyNetworkWordCount").setMaster("local[2]")
val ssc = new StreamingContext(conf,Seconds(1))
val rddQueue = new mutable.Queue[RDD[Int]]()
for(i <- 1 to 3){
rddQueue += ssc.sparkContext.makeRDD(i to 10)
Thread.sleep(2000)
}
val inputDStream = ssc.queueStream(rddQueue)
val result = inputDStream.map(x => (x,x*2))
result.print()
ssc.start()
ssc.awaitTermination()
}
}