Spark：用Scala和Java實現WordCount

http://www.cnblogs.com/byrhuangqiang/p/4017725.html

爲了在IDEA中編寫scala，今天安裝配置學習了IDEA集成開發環境。IDEA確實很優秀，學會之後，用起來很順手。關於如何搭建scala和IDEA開發環境，請看文末的參考資料。

用Scala和Java實現WordCount，其中Java實現的JavaWordCount是spark自帶的例子（$SPARK_HOME/examples/src/main/java/org/apache/spark/examples/JavaWordCount.java）

1.環境

OS:Red Hat Enterprise Linux Server release 6.4 (Santiago)
Hadoop:Hadoop 2.4.1
JDK:1.7.0_60
Spark:1.1.0
Scala:2.11.2
集成開發環境：IntelliJ IDEA 13.1.3

注意：需要在客戶端windows環境下安裝IDEA、Scala、JDK，並且爲IDEA下載scala插件。

2.Scala實現單詞計數

 1 package com.hq
 2 
 3 /**
 4  * User: hadoop
 5  * Date: 2014/10/10 0010
 6  * Time: 18:59
 7  */
 8 import org.apache.spark.SparkConf
 9 import org.apache.spark.SparkContext
10 import org.apache.spark.SparkContext._
11 
12 /**
13  * 統計字符出現次數
14  */
15 object WordCount {
16   def main(args: Array[String]) {
17     if (args.length < 1) {
18       System.err.println("Usage: <file>")
19       System.exit(1)
20     }
21 
22     val conf = new SparkConf()
23     val sc = new SparkContext(conf)
24     val line = sc.textFile(args(0))
25 
26     line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
27 
28     sc.stop()
29   }
30 }

3.Java實現單詞計數

package com.hq;

/**
* User: hadoop
* Date: 2014/10/10 0010
* Time: 19:26
*/

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public final class JavaWordCount {
 private static final Pattern SPACE = Pattern.compile(" ");

 public static void main(String[] args) throws Exception {

   if (args.length < 1) {
     System.err.println("Usage: JavaWordCount <file>");
     System.exit(1);
   }

   SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount");
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
   JavaRDD<String> lines = ctx.textFile(args[0], 1);

   JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
     @Override
     public Iterable<String> call(String s) {
       return Arrays.asList(SPACE.split(s));
     }
   });

   JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {
     @Override
     public Tuple2<String, Integer> call(String s) {
       return new Tuple2<String, Integer>(s, 1);
     }
   });

   JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
     @Override
     public Integer call(Integer i1, Integer i2) {
       return i1 + i2;
     }
   });

   List<Tuple2<String, Integer>> output = counts.collect();
   for (Tuple2<?, ?> tuple : output) {
     System.out.println(tuple._1() + ": " + tuple._2());
   }
   ctx.stop();
 }
}

4.IDEA打包和運行

4.1 IDEA的工程結構

在IDEA中建立Scala工程，並導入spark api編程jar包（spark-assembly-1.1.0-hadoop2.4.0.jar：$SPARK_HOME/lib/裏面）

4.2 打成jar包

File ---> Project Structure

配置完成後，在菜單欄中選擇Build->Build Artifacts...，然後使用Build等命令打包。打包完成後會在狀態欄中顯示“Compilation completed successfully...”的信息，去jar包輸出路徑下查看jar包，如下所示。

ScalaTest1848.jar就是我們編程所產生的jar包，裏面包含了三個類HelloWord、WordCount、JavaWordCount。

可以用這個jar包在spark集羣裏面運行java或者scala的單詞計數程序。

4.3 以Spark集羣standalone方式運行單詞計數

上傳jar包到服務器，並放置在/home/ebupt/test/WordCount.jar路徑下。

上傳一個text文本文件到HDFS作爲單詞計數的輸入文件：hdfs://eb170:8020/user/ebupt/text

內容如下

import org apache spark api java JavaPairRDD   
import org apache spark api java JavaRDD   
import org apache spark api java JavaSparkContext   
import org apache spark api java function FlatMapFunction   
import org apache spark api java function Function   
import org apache spark api java function Function2   
import org apache spark api java function PairFunction   
import scala Tuple2 

用spark-submit命令提交任務運行，具體使用查看：spark-submit --help

 1 [ebupt@eb174 bin]$ spark-submit --help
 2 Spark assembly has been built with Hive, including Datanucleus jars on classpath
 3 Usage: spark-submit [options] <app jar | python file> [app options]
 4 Options:
 5   --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
 6   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
 7                               on one of the worker machines inside the cluster ("cluster")
 8                               (Default: client).
 9   --class CLASS_NAME          Your application's main class (for Java / Scala apps).
10   --name NAME                 A name of your application.
11   --jars JARS                 Comma-separated list of local jars to include on the driver
12                               and executor classpaths.
13   --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
14                               on the PYTHONPATH for Python apps.
15   --files FILES               Comma-separated list of files to be placed in the working
16                               directory of each executor.
17 
18   --conf PROP=VALUE           Arbitrary Spark configuration property.
19   --properties-file FILE      Path to a file from which to load extra properties. If not
20                               specified, this will look for conf/spark-defaults.conf.
21 
22   --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).
23   --driver-java-options       Extra Java options to pass to the driver.
24   --driver-library-path       Extra library path entries to pass to the driver.
25   --driver-class-path         Extra class path entries to pass to the driver. Note that
26                               jars added with --jars are automatically included in the
27                               classpath.
28 
29   --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
30 
31   --help, -h                  Show this help message and exit
32   --verbose, -v               Print additional debug output
33 
34  Spark standalone with cluster deploy mode only:
35   --driver-cores NUM          Cores for driver (Default: 1).
36   --supervise                 If given, restarts the driver on failure.
37 
38  Spark standalone and Mesos only:
39   --total-executor-cores NUM  Total cores for all executors.
40 
41  YARN-only:
42   --executor-cores NUM        Number of cores per executor (Default: 1).
43   --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
44   --num-executors NUM         Number of executors to launch (Default: 2).
45   --archives ARCHIVES         Comma separated list of archives to be extracted into the
46                               working directory of each executor.

①提交scala實現的單詞計數：

[ebupt@eb174 test]$ spark-submit --master spark://eb174:7077 --name WordCountByscala --class com.hq.WordCount --executor-memory 1G --total-executor-cores 2 ~/test/WordCount.jar hdfs://eb170:8020/user/ebupt/text

②提交java實現的單詞計數：

[ebupt@eb174 test]$ spark-submit --master spark://eb174:7077 --name JavaWordCountByHQ --class com.hq.JavaWordCount --executor-memory 1G --total-executor-cores 2 ~/test/WordCount.jar hdfs://eb170:8020/user/ebupt/text

③2者運行結果類似，所以只寫了一個：

Spark assembly has been built with Hive, including Datanucleus jars on classpath 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
14/10/10 19:24:51 INFO SecurityManager: Changing view acls to: ebupt, 
14/10/10 19:24:51 INFO SecurityManager: Changing modify acls to: ebupt, 
14/10/10 19:24:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ebupt, ); users with modify permissions: Set(ebupt, ) 
14/10/10 19:24:52 INFO Slf4jLogger: Slf4jLogger started 
14/10/10 19:24:52 INFO Remoting: Starting remoting 
14/10/10 19:24:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@eb174:56344] 
14/10/10 19:24:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@eb174:56344] 
14/10/10 19:24:52 INFO Utils: Successfully started service 'sparkDriver' on port 56344. 
14/10/10 19:24:52 INFO SparkEnv: Registering MapOutputTracker 
14/10/10 19:24:52 INFO SparkEnv: Registering BlockManagerMaster 
14/10/10 19:24:52 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141010192452-3398 
14/10/10 19:24:52 INFO Utils: Successfully started service 'Connection manager for block manager' on port 41110. 
14/10/10 19:24:52 INFO ConnectionManager: Bound socket to port 41110 with id = ConnectionManagerId(eb174,41110) 
14/10/10 19:24:52 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 
14/10/10 19:24:52 INFO BlockManagerMaster: Trying to register BlockManager 
14/10/10 19:24:52 INFO BlockManagerMasterActor: Registering block manager eb174:41110 with 265.4 MB RAM 
14/10/10 19:24:52 INFO BlockManagerMaster: Registered BlockManager 
14/10/10 19:24:52 INFO HttpFileServer: HTTP File server directory is /tmp/spark-8051667e-bfdb-4ecd-8111-52992b16bb13 
14/10/10 19:24:52 INFO HttpServer: Starting HTTP Server 
14/10/10 19:24:52 INFO Utils: Successfully started service 'HTTP file server' on port 48233. 
14/10/10 19:24:53 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
14/10/10 19:24:53 INFO SparkUI: Started SparkUI at http://eb174:4040 
14/10/10 19:24:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/10/10 19:24:53 INFO SparkContext: Added JAR file:/home/ebupt/test/WordCountByscala.jar at http://10.1.69.174:48233/jars/WordCountByscala.jar with timestamp 1412940293532 
14/10/10 19:24:53 INFO AppClient$ClientActor: Connecting to master spark://eb174:7077... 
14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556 
14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB) 
14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141010192453-0009 
14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/0 on worker-20141008204132-eb176-49618 (eb176:49618) with 1 cores 
14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/0 on hostPort eb176:49618 with 1 cores, 1024.0 MB RAM 
14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/1 on worker-20141008204132-eb175-56337 (eb175:56337) with 1 cores 
14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/1 on hostPort eb175:56337 with 1 cores, 1024.0 MB RAM 
14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/0 is now RUNNING 
14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/1 is now RUNNING 
14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=163705, maxMem=278302556 
14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.2 MB) 
14/10/10 19:24:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb174:41110 (size: 12.3 KB, free: 265.4 MB) 
14/10/10 19:24:53 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 
14/10/10 19:24:54 INFO FileInputFormat: Total input paths to process : 1 
14/10/10 19:24:54 INFO SparkContext: Starting job: collect at WordCount.scala:26 
14/10/10 19:24:54 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:26) 
14/10/10 19:24:54 INFO DAGScheduler: Got job 0 (collect at WordCount.scala:26) with 2 output partitions (allowLocal=false) 
14/10/10 19:24:54 INFO DAGScheduler: Final stage: Stage 0(collect at WordCount.scala:26) 
14/10/10 19:24:54 INFO DAGScheduler: Parents of final stage: List(Stage 1) 
14/10/10 19:24:54 INFO DAGScheduler: Missing parents: List(Stage 1) 
14/10/10 19:24:54 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:26), which has no missing parents 
14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(3400) called with curMem=176338, maxMem=278302556 
14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 265.2 MB) 
14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(2082) called with curMem=179738, maxMem=278302556 
14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.2 MB) 
14/10/10 19:24:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb174:41110 (size: 2.0 KB, free: 265.4 MB) 
14/10/10 19:24:54 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 
14/10/10 19:24:54 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:26) 
14/10/10 19:24:54 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 
14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@eb176:35482/user/Executor#1456950111] with ID 0 
14/10/10 19:24:56 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, eb176, ANY, 1238 bytes) 
14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@eb175:35502/user/Executor#-1231100997] with ID 1 
14/10/10 19:24:56 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, eb175, ANY, 1238 bytes) 
14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb176:33296 with 530.3 MB RAM 
14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb175:32903 with 530.3 MB RAM 
14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb176/10.1.69.176:39218] 
14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb175/10.1.69.175:55227] 
14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb176/10.1.69.176:33296] 
14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb175/10.1.69.175:32903] 
14/10/10 19:24:57 INFO SendingConnection: Connected to [eb175/10.1.69.175:32903], 1 messages pending 
14/10/10 19:24:57 INFO SendingConnection: Connected to [eb176/10.1.69.176:33296], 1 messages pending 
14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb175:32903 (size: 2.0 KB, free: 530.3 MB) 
14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb176:33296 (size: 2.0 KB, free: 530.3 MB) 
14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb176:33296 (size: 12.3 KB, free: 530.3 MB) 
14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb175:32903 (size: 12.3 KB, free: 530.3 MB) 
14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 1697 ms on eb175 (1/2) 
14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1715 ms on eb176 (2/2) 
14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
14/10/10 19:24:58 INFO DAGScheduler: Stage 1 (map at WordCount.scala:26) finished in 3.593 s 
14/10/10 19:24:58 INFO DAGScheduler: looking for newly runnable stages 
14/10/10 19:24:58 INFO DAGScheduler: running: Set() 
14/10/10 19:24:58 INFO DAGScheduler: waiting: Set(Stage 0) 
14/10/10 19:24:58 INFO DAGScheduler: failed: Set() 
14/10/10 19:24:58 INFO DAGScheduler: Missing parents for Stage 0: List() 
14/10/10 19:24:58 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26), which is now runnable 
14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(2096) called with curMem=181820, maxMem=278302556 
14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.0 KB, free 265.2 MB) 
14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(1338) called with curMem=183916, maxMem=278302556 
14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1338.0 B, free 265.2 MB) 
14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb174:41110 (size: 1338.0 B, free: 265.4 MB) 
14/10/10 19:24:58 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 
14/10/10 19:24:58 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26) 
14/10/10 19:24:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
14/10/10 19:24:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, eb175, PROCESS_LOCAL, 1008 bytes) 
14/10/10 19:24:58 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, eb176, PROCESS_LOCAL, 1008 bytes) 
14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb175:32903 (size: 1338.0 B, free: 530.3 MB) 
14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb176:33296 (size: 1338.0 B, free: 530.3 MB) 
14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@eb175:59119 
14/10/10 19:24:58 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 144 bytes 
14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@eb176:39028 
14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 109 ms on eb176 (1/2) 
14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 120 ms on eb175 (2/2) 
14/10/10 19:24:58 INFO DAGScheduler: Stage 0 (collect at WordCount.scala:26) finished in 0.123 s 
14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/10/10 19:24:58 INFO SparkContext: Job finished: collect at WordCount.scala:26, took 3.815637915 s 
(scala,1) 
(Function2,1) 
(JavaSparkContext,1) 
(JavaRDD,1) 
(Tuple2,1) 
(,1) 
(org,7) 
(apache,7) 
(JavaPairRDD,1) 
(java,7) 
(function,4) 
(api,7) 
(Function,1) 
(PairFunction,1) 
(spark,7) 
(FlatMapFunction,1) 
(import,8) 
14/10/10 19:24:58 INFO SparkUI: Stopped Spark web UI at http://eb174:4040 
14/10/10 19:24:58 INFO DAGScheduler: Stopping DAGScheduler 
14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Shutting down all executors 
14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 
14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296) 
14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296) 
14/10/10 19:24:58 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found 
14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb175,32903) 
14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903) 
14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903) 
14/10/10 19:24:58 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@5e92c11b 
14/10/10 19:24:58 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@5e92c11b 
java.nio.channels.CancelledKeyException 
at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:310) 
at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 
14/10/10 19:24:59 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 
14/10/10 19:24:59 INFO ConnectionManager: Selector thread was interrupted! 
14/10/10 19:24:59 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296) 
14/10/10 19:24:59 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found 
14/10/10 19:24:59 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296) 
14/10/10 19:24:59 WARN ConnectionManager: All connections not cleaned up 
14/10/10 19:24:59 INFO ConnectionManager: ConnectionManager stopped 
14/10/10 19:24:59 INFO MemoryStore: MemoryStore cleared 
14/10/10 19:24:59 INFO BlockManager: BlockManager stopped 
14/10/10 19:24:59 INFO BlockManagerMaster: BlockManagerMaster stopped 
14/10/10 19:24:59 INFO SparkContext: Successfully stopped SparkContext 
14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
14/10/10 19:24:59 INFO Remoting: Remoting shut down 
14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

5.參考資料

關於IDEA的使用：Scala從零開始：使用Intellij IDEA寫hello world

scala編寫WC: Spark wordcount開發並提交到集羣運行

java編寫WC:用java編寫spark程序，簡單示例及運行、Spark在Yarn上運行Wordcount程序

Spark Programming Guide

Spark：用Scala和Java實現WordCount

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

java由於越界導致的報錯

如何在Hive中實現遞歸計算

Hive中的函數列表

準確率和召回率介紹

Spark：用Scala和Java實現WordCount

卷積神經網絡CNN

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結