大數據技術之Hadoop之MapReduce(3)——CombineTextInputFormat

3.1.5 CombineTextInputFormat案例實操
示例:統計單詞個數
  1. 準備工作
    在hdfs的根目錄下創建input文件夾,然後在裏面放置4個大小分別爲1.5M、35M、5.5M、6.5M的小文件作爲輸入數據
  2. 具體代碼
  • Mapper類
/**
 * @Author zhangyong
 * @Date 2020/3/4 16:35
 * @Version 1.0
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private Text mapOutputKey = new Text();
    private IntWritable mapOutputValue = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String linevalue = value.toString();  //1.將讀取的文件變成,偏移量+內容//讀取一行數據
        StringTokenizer st = new StringTokenizer(linevalue);//使用空格分隔
        while (st.hasMoreTokens()) {//判斷是否還有分隔符,有的話代表還有單詞
            String word = st.nextToken();//返回從當前位置到下一個分隔符之間的字符串(單詞)
            mapOutputKey.set(word);
            mapOutputValue.set(1);
            context.write(mapOutputKey, mapOutputValue);
        }
    }
}
  • Reducer類:
/**
 * @Author zhangyong
 * @Date 2020/3/4 16:35
 * @Version 1.0
 */
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable outputValue = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;    //彙總
        for (IntWritable value : values) {
            sum += value.get();
        }
        outputValue.set(sum);
        context.write(key, outputValue);
    }
}
  • Driver類
/**
 * @Author zhangyong
 * @Date 2020/3/4 16:35
 * @Version 1.0
 */
public class WordCountDriver {
    public static void main(String[] args) throws Exception {
       //需要在resources下面提供core-site.xml文件
        args = new String[]{
                "/input/",
                "/output/"
        };

        Configuration cfg = new Configuration();   //獲取配置

        Job job = Job.getInstance(cfg, WordCountDriver.class.getSimpleName());
        job.setJarByClass(WordCountDriver.class);

        //如果不設置InputFormat,默認是TextInputFormat
        job.setInputFormatClass(CombineTextInputFormat.class);
		//虛擬存儲切片最大值設爲20M
        CombineTextInputFormat.setMaxInputSplitSize(job,20*1024*1024);
  
        //設置map與需要設置的內容類 + 輸出key與value
        job.setMapperClass(WordCountMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //設置reduce
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //設置input與output
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        //將job交給Yarn
        boolean issucess = job.waitForCompletion(true);
        int status=  issucess ? 0 : 1;
        System.exit(status);
    }
}
  1. 運行結果
    在這裏插入圖片描述
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章