log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/01/14 06:52:37 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/14 06:52:38 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/14 06:52:47 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/14 06:52:48 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/14 06:52:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/14 06:52:54 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.
scala> val infile = sc.textFile("hdfs://sparkmaster:9000/input")
<--輸入文件
16/01/19 22:10:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 228.3 KB, free 334.9 KB)
16/01/19 22:10:35 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 19.6 KB, free 354.5 KB)
16/01/19 22:10:35 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.10.80:50237 (size: 19.6 KB, free: 511.5 MB)
16/01/19 22:10:35 INFO SparkContext: Created broadcast 1 from textFile at <console>:27
infile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[5] at textFile at <console>:27
scala> val words = infile.flatMap(line => line.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[6] at flatMap at <console>:29
scala> val counts = words.map(word => (word, 1)).reduceByKey{case (x,y) => x + y}
16/01/19 22:10:46 INFO FileInputFormat: Total input paths to process : 1
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:31
scala> counts.saveAsTextFile("hdfs://sparkmaster:9000/out01") <--輸出目錄
16/01/19 22:11:32 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/01/19 22:11:32 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/01/19 22:11:32 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/01/19 22:11:32 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/01/19 22:11:32 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/01/19 22:11:33 INFO SparkContext: Starting job: saveAsTextFile at <console>:34
16/01/19 22:11:33 INFO DAGScheduler: Registering RDD 7 (map at <console>:31)
16/01/19 22:11:33 INFO DAGScheduler: Got job 0 (saveAsTextFile at <console>:34) with 2 output partitions
16/01/19 22:11:33 INFO DAGScheduler: Final stage: ResultStage 1 (saveAsTextFile at <console>:34)
16/01/19 22:11:33 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
16/01/19 22:11:33 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
16/01/19 22:11:33 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at map at <console>:31), which has no missing parents
16/01/19 22:11:33 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.1 KB, free 358.7 KB)
16/01/19 22:11:33 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.3 KB, free 360.9 KB)
16/01/19 22:11:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.10.80:50237 (size: 2.3 KB, free: 511.5 MB)
16/01/19 22:11:33 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
16/01/19 22:11:33 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at map at <console>:31)
16/01/19 22:11:34 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/01/19 22:11:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sparkmaster, partition 0,NODE_LOCAL, 2120 bytes)
16/01/19 22:11:34 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sparkmaster, partition 1,NODE_LOCAL, 2120 bytes)
16/01/19 22:11:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on sparkmaster:34871 (size: 2.3 KB, free: 511.5 MB)
16/01/19 22:12:05 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sparkmaster:34871 (size: 19.6 KB, free: 511.5 MB)
16/01/19 22:12:18 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 43960 ms on sparkmaster (1/2)
16/01/19 22:12:18 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 44063 ms on sparkmaster (2/2)
16/01/19 22:12:18 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/01/19 22:12:18 INFO DAGScheduler: ShuffleMapStage 0 (map at <console>:31) finished in 44.138 s
16/01/19 22:12:18 INFO DAGScheduler: looking for newly runnable stages
16/01/19 22:12:18 INFO DAGScheduler: running: Set()
16/01/19 22:12:18 INFO DAGScheduler: waiting: Set(ResultStage 1)
16/01/19 22:12:18 INFO DAGScheduler: failed: Set()
16/01/19 22:12:18 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at saveAsTextFile at <console>:34), which has no missing parents
16/01/19 22:12:18 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 64.7 KB, free 425.6 KB)
16/01/19 22:12:18 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 22.5 KB, free 448.1 KB)
16/01/19 22:12:18 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.10.80:50237 (size: 22.5 KB, free: 511.4 MB)
16/01/19 22:12:18 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/01/19 22:12:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at saveAsTextFile at <console>:34)
16/01/19 22:12:18 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
16/01/19 22:12:18 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, sparkmaster, partition 0,NODE_LOCAL, 1894 bytes)
16/01/19 22:12:18 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, sparkmaster, partition 1,NODE_LOCAL, 1894 bytes)
16/01/19 22:12:18 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on sparkmaster:34871 (size: 22.5 KB, free: 511.5 MB)
16/01/19 22:12:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to sparkmaster:45740
16/01/19 22:12:20 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 146 bytes
16/01/19 22:12:47 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 28853 ms on sparkmaster (1/2)
16/01/19 22:12:47 INFO DAGScheduler: ResultStage 1 (saveAsTextFile at <console>:34) finished in 29.028 s
16/01/19 22:12:47 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 29006 ms on sparkmaster (2/2)
16/01/19 22:12:47 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/01/19 22:12:47 INFO DAGScheduler: Job 0 finished: saveAsTextFile at <console>:34, took 74.176049 s
scala>