要想wordcount在hadoop上運行,那麼必須爲wordcount程序指定輸入路徑和輸出路徑。輸入路徑是我們要進行詞頻統計的文本文件,在這裏我們的文件名是20417.txt。而輸出路徑是詞頻統計結果存放的路徑。如下圖所示,是進行參數配置:WordCount.java->右鍵->Run As->Run Configuration
上述的路徑是HDFS中的路徑,HDFS路徑可以查看下圖:
在圖一中我們輸入完輸入輸出路徑以後,我們點擊Apply,但是這個時候不能點擊Run,因爲這裏的run是指在單機上run,而我們是要在hadoop集羣上run,因此我們執行以下步驟:WordCount.java->右鍵->Run as->Run on hadoop。
運行過程中console會提示一些信息,如下所示:
- 11/10/09 14:07:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
- 11/10/09 14:07:50 INFO input.FileInputFormat: Total input paths to process : 1
- 11/10/09 14:07:50 INFO mapred.JobClient: Running job: job_201110091333_0001
- 11/10/09 14:07:51 INFO mapred.JobClient: map 0% reduce 0%
- 11/10/09 14:07:59 INFO mapred.JobClient: map 100% reduce 0%
- 11/10/09 14:08:12 INFO mapred.JobClient: map 100% reduce 100%
- 11/10/09 14:08:14 INFO mapred.JobClient: Job complete: job_201110091333_0001
- 11/10/09 14:08:14 INFO mapred.JobClient: Counters: 17
- 11/10/09 14:08:14 INFO mapred.JobClient: Job Counters
- 11/10/09 14:08:14 INFO mapred.JobClient: Launched reduce tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: Launched map tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: Data-local map tasks=1
- 11/10/09 14:08:14 INFO mapred.JobClient: FileSystemCounters
- 11/10/09 14:08:14 INFO mapred.JobClient: FILE_BYTES_READ=143076
- 11/10/09 14:08:14 INFO mapred.JobClient: HDFS_BYTES_READ=674762
- 11/10/09 14:08:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=286184
- 11/10/09 14:08:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=205265
- 11/10/09 14:08:14 INFO mapred.JobClient: Map-Reduce Framework
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce input groups=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Combine output records=10015
- 11/10/09 14:08:14 INFO mapred.JobClient: Map input records=12761
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce shuffle bytes=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce output records=0
- 11/10/09 14:08:14 INFO mapred.JobClient: Spilled Records=20030
- 11/10/09 14:08:14 INFO mapred.JobClient: Map output bytes=1082004
- 11/10/09 14:08:14 INFO mapred.JobClient: Combine input records=112607
- 11/10/09 14:08:14 INFO mapred.JobClient: Map output records=112607
- 11/10/09 14:08:14 INFO mapred.JobClient: Reduce input records=10015
- 11/10/09 14:08:14 INFO input.FileInputFormat: Total input paths to process : 1
- 11/10/09 14:08:14 INFO mapred.JobClient: Running job: job_201110091333_0002
- 11/10/09 14:08:15 INFO mapred.JobClient: map 0% reduce 0%
- 11/10/09 14:08:24 INFO mapred.JobClient: map 100% reduce 0%
- 11/10/09 14:08:36 INFO mapred.JobClient: map 100% reduce 100%
- 11/10/09 14:08:38 INFO mapred.JobClient: Job complete: job_201110091333_0002
- 11/10/09 14:08:38 INFO mapred.JobClient: Counters: 17
- 11/10/09 14:08:38 INFO mapred.JobClient: Job Counters
- 11/10/09 14:08:38 INFO mapred.JobClient: Launched reduce tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: Launched map tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: Data-local map tasks=1
- 11/10/09 14:08:38 INFO mapred.JobClient: FileSystemCounters
- 11/10/09 14:08:38 INFO mapred.JobClient: FILE_BYTES_READ=143076
- 11/10/09 14:08:38 INFO mapred.JobClient: HDFS_BYTES_READ=205265
- 11/10/09 14:08:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=286184
- 11/10/09 14:08:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=104533
- 11/10/09 14:08:38 INFO mapred.JobClient: Map-Reduce Framework
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce input groups=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Combine output records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Map input records=10015
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce shuffle bytes=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce output records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Spilled Records=20030
- 11/10/09 14:08:38 INFO mapred.JobClient: Map output bytes=123040
- 11/10/09 14:08:38 INFO mapred.JobClient: Combine input records=0
- 11/10/09 14:08:38 INFO mapred.JobClient: Map output records=10015
- 11/10/09 14:08:38 INFO mapred.JobClient: Reduce input records=10015