MapReduce之計數器

原創

liuzx32

2020-02-22 11:56

（1）計數器主要用來收集系統信息，以及相關作業的運行時候的統計數據，用於知道作業成功、失敗等情況；

（2）相比而言，計數器方式比日誌更易於分析。

內置計數器：

（1）Hadoop內置的計數器，主要用來記錄作業的執行情況

（2）內置計數器包括MapReduce框架計數器（Map-Reduce Framework）

——文件系統計數器（FielSystemCounters）

——作業計數器（Job Counters）

——文件輸入格式計數器（File Output Format Counters）

——文件輸出格式計數器（File Input Format Counters）

（3）計數器由相關的task進行維護，定期傳遞給tasktracker，再由tasktracker傳給jobtracker；

（4）最終的作業計數器實際上是有jobtracker維護，所以計數器可以被全局彙總，同時也不必在整個網絡中傳遞

（5）只有當一個作業執行成功後，最終的計數器的值纔是完整可靠的；

自定義Java計數器：

（1）MapReduce允許用戶自定義計數器；

（2）計數器是一個全局變量；

（3）計數器有組的概念，可以用Java的枚舉類型或者用字符串來定義

方法：

[java]view
plaincopyprint?

Counter getCounter(Enum<?> counterName) ;  

Counter getCounter(String groupName,String counterName) ;  

（4）字符串方式（動態計數器）比枚舉類型要更加靈活，可以動態在一個組下面添加多個計數器；

（5）在舊版API中使用Reporter，而新版API使用context.getCounter(groupName,counterName)來獲取計數器配置並設置；

（6）計數器遞增

方法：

[java]view
plaincopyprint?

void increment(long incr) ;  

按給定的值增長計數器；

源代碼org.apache.hadoop.mapreduce.Counter類。

計數器使用：

（1）WebUI 查看（50030,50070）；

（2）命令行方式：hadoop job -counter；

（3）使用Hadoop API

通過job.getCounters()得到Counters,而後調用counters.findCounter()方法去得到計數器對象；

查看最終的計數器的值需要等作用完成後。

自定義計數器實驗，統計詞彙行中詞彙數超過2個或少於2個的行數：

輸入數據文件counter.txt:

hello world
hello
hello world 111
hello world 111 222

新建項目TestCounter，包com.counter，

源代碼MyMapper.java:

[java]view
plaincopyprint?

package com.counter;  

import java.io.IOException;  

import org.apache.hadoop.io.LongWritable;  

import org.apache.hadoop.io.Text;  

import org.apache.hadoop.mapreduce.Mapper;  

public class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {  

    @Override  

    protected void map(LongWritable key, Text value,  

            org.apache.hadoop.mapreduce.Mapper.Context context)  

            throws IOException, InterruptedException {  

        // TODO Auto-generated method stub  

        String[] val = value.toString().split("\\s+");  

        if(val.length < 2)  

            context.getCounter("ErrorCounter","below_2").increment(1);  

        else if(val.length > 2)  

            context.getCounter("ErrorCounter", "above_2").increment(1);  

        context.write(key, value);  

    }  

}

源代碼TestCounter.java:

[java]view
plaincopyprint?

package com.counter;  

import java.io.IOException;  

import org.apache.hadoop.conf.Configuration;  

import org.apache.hadoop.fs.Path;  

import org.apache.hadoop.mapreduce.Job;  

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  

import org.apache.hadoop.util.GenericOptionsParser;  

public class TestCounter {  

  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{  

      Configuration conf = new Configuration();  

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();  

        if (otherArgs.length != 2) {  

          System.err.println("Usage: wordcount <in> <out>");  

          System.exit(2);  

        }  

        Job job = new Job(conf, "word count");  

        job.setJarByClass(TestCounter.class);  

        job.setMapperClass(MyMapper.class);  

        job.setNumReduceTasks(0);  

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  

        System.exit(job.waitForCompletion(true) ? 0 : 1);  

  }  

}