我的大數據之旅-Spark shell Word Count

目錄

上傳文件到HDFS

運行Spark shell

統計RELEASE文件裏每個單詞的數量

查看Job結果



上傳文件到HDFS

#hdfs創建spark目錄:
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -mkdir spark
#將Spark RELEASE文件上傳到hdfs spark目錄下
[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -put RELEASE spark/

運行Spark shell

[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ bin/spark-shell --master spark://hadoop129:7077

--master 參數指定不指定都可以,如圖,已經成功起動spark shell

統計RELEASE文件裏每個單詞的數量

scala> val textFile = sc.textFile("hdfs://hadoop129:9000/user/fengling/spark/RELEASE")
scala> val counts = textFile.flatMap(line => line.split(" ")).map(word => (word , 1)).reduceByKey(_ + _)
scala> counts.saveAsTextFile("hdfs://hadoop129:9000/user/fengling/spark/WordCount_201909261300")

可以訪問spark shell終端提示的spark context web UI地址:http://hadoop129:4040  查看job的運行情況

查看Job結果

[fengling@hadoop129 spark-2.4.4-bin-hadoop2.7]$ hdfs dfs -cat /user/fengling/spark/WordCount_201909261300/part*
(-Psparkr,1)
(2.4.4,1)
(Build,1)
(built,1)
(-Pflume,1)
(-Phive-thriftserver,1)
(-Pmesos,1)
(2.7.3,1)
(-Phadoop-2.7,1)
(-B,1)
(Spark,1)
(-Pkubernetes,1)
(-Pyarn,1)
(-DzincPort=3036,1)
(flags:,1)
(for,1)
(-Phive,1)
(-Pkafka-0-8,1)
(Hadoop,1)

 

 

 

 

 

 

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章