MapReduce之WordCount字數統計

第一次WordCount小遊戲

在idea客戶端上面進行WordCount統計

1:創建mapper類繼承mapper(選hadoop類型)

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
//LongWritable(表示mapper輸入數據的key每一行數據的編號)
//Text(表示輸入數據的value,相當於每一行數據上面的所有單詞)
//Text(是輸出的key,指一個單詞)
//IntWritable(表示每個單詞計1次)

}

1.2重寫map方法 ctrl+o 選map

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
 		//key 指每一行的編號(偏移量)
 		//value 一串單詞
 		//context上下文輸出
      
    }
}

1.3 mapper步驟

public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //1獲取行數據
        
        String line = value.toString();

        //2切割空格
        
        String[] words = line.split(" ");


        for (String word : words) {
        
            //判斷是否有兩個空格,直接跳過
            
            if (word.equals("")){
                continue;
            }
            //3輸出每一個單詞
            context.write(new Text(word),new IntWritable(1));
        }
    }
}

2:創建reduce類繼承reduce(選hadoop類型)

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
	//Text(表示接收mapper輸出的單個單詞)
	//IntWritable(表示每一個單詞計1次)
	//Text(表示reduce輸出的單個單詞)
	//IntWritable(表示reduce輸出每個單詞的統計總個數)
    
}

2.2重寫reduce方法 ctrl+o 選reduce

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    	//key 每一個單詞
 		//values 單詞個數的list集合 {1,1,1,1,1,1,1}
 		//context上下文輸出
    }
}

2.3reduce 步驟

public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        //1統計所有單詞個數
        
        int count = 0;
        for (IntWritable value : values) {
            count += value.get();
           
        }

        //2輸出所有單詞
        context.write(key, new IntWritable(count));
    }
}

3:創建主類 driver

public class wordcountDriver {
    public static void main(String[] args) throws Exception {
        // 1獲取job的對象信息
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        //2加載jar位置
        job.setJarByClass(wordcountDriver.class);

        //3設置mapper和reduce的class類
        job.setMapperClass(wordcountMapper.class);
        job.setReducerClass(wordcountReduce.class);

        //4設置mapper輸出類型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //5設置最終端數據類型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //6設置輸入數據和輸出數據路徑
        //處理數據所在位置
        FileInputFormat.setInputPaths(job, "hdfs://192.168.100.100:8020/hello/mapreduce/test.txt");
        //處理完數據保存的位置
        FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.100.100:8020/hello/mapreduce/wordcountout/"));

        //7提交
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 :  1);

    }
}

3.1 檢查test.txt(未處理)

未處理

運行driver

運行driver

運行結果

driver運行結果

web查看並下載結果(ip:50070)

下載reduce計算結果

打開查看

perfect

在linux集羣上面進行WordCount統計

1:修改driver裏面的輸入數據和輸出數據路徑

public class wordcountDriver {
    public static void main(String[] args) throws Exception {
        // 1獲取job的對象信息
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        //2加載jar位置
        job.setJarByClass(wordcountDriver.class);

        //3設置mapper和reduce的class類
        job.setMapperClass(wordcountMapper.class);
        job.setReducerClass(wordcountReduce.class);

        //4設置mapper輸出類型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //5設置最終端數據類型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //6設置輸入數據和輸出數據路徑
        //處理數據所在位置
        //FileInputFormat.setInputPaths(job, "hdfs://192.168.100.100:8020/hello/mapreduce/test.txt");
        //處理完數據保存的位置
        //FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.100.100:8020/hello/mapreduce/wordcountout/"));
       FileInputFormat.setInputPaths(job,new Path(args [0]));
       FileOutputFormat.setOutputPath(job,new Path(args [1]));
        //7提交
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 :  1);

    }
}

2:idea打jar包(跳過打jar的流程)

3:把jar包導入到Linux(導入的方式也有 N種)

導入之後

4:把處理前的text文件傳入hdfs

5: 通過 hadoop jar *******.jar /處理前文件路徑input/text.txt /輸出結果路徑output

啓動

6:運行結果啓動

7:web下載處理後文件

下載處理後

8:查看處理後文件在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章