目錄
上傳文件到HDFS
#hdfs創建spark目錄:
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -mkdir spark
#將Spark RELEASE文件上傳到hdfs spark目錄下
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -put RELEASE spark/
運行Spark shell
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ bin/spark-shell --master spark://hadoop129:7077
--master 參數指定不指定都可以,如圖,已經成功起動spark shell
統計RELEASE文件裏每個單詞的數量
scala> val textFile = sc.textFile("hdfs://hadoop129:9000/user/fengling/spark/RELEASE")
scala> val counts = textFile.flatMap(line => line.split(" ")).map(word => (word , 1)).reduceByKey(_ + _)
scala> counts.saveAsTextFile("hdfs://hadoop129:9000/user/fengling/spark/WordCount_201909261300")
可以訪問spark shell終端提示的spark context web UI地址:http://hadoop129:4040 查看job的運行情況
查看Job結果
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -cat /user/fengling/spark/WordCount_201909261300/part*
(-Psparkr,1)
(2.4.4,1)
(Build,1)
(built,1)
(-Pflume,1)
(-Phive-thriftserver,1)
(-Pmesos,1)
(2.7.3,1)
(-Phadoop-2.7,1)
(-B,1)
(Spark,1)
(-Pkubernetes,1)
(-Pyarn,1)
(-DzincPort=3036,1)
(flags:,1)
(for,1)
(-Phive,1)
(-Pkafka-0-8,1)
(Hadoop,1)