MapReduce Features

- Counters (values are definitive only once job has successfully completed)

  • Task Counters
  • Filesystem Counters
  • Job Counters (only in application master. doesn't need to send across network, mainly about task info)
  • FileInputFormat Counters
  • FileOutputFormat Counters
  • User-defined counters
  1. by enum
context.getCounter(Temperature.MALFORMED).increment(1);
  1. by counter group 
public Counter getCounter(String groupName, String counterName)


- Sorting

  • Partial sort (due to multiple map tasks and multiple reduce tasks)
  • Total sort 
InputSampler.Sampler<IntWritable, Text> sampler =
new InputSampler.RandomSampler<IntWritable, Text>(0.1, 10000, 10);
InputSampler.writePartitionFile(job, sampler);
// Add to DistributedCache
Configuration conf = job.getConfiguration();
String partitionFile = TotalOrderPartitioner.getPartitionFile(conf);
URI partitionUri = new URI(partitionFile);
job.addCacheFile(partitionUri);

  • secondary sort
  1. Make the key a composite of the natural key and the natural value.
  2. The sort comparator should order by the composite key (i.e., the natural key and natural value).
  3. The partitioner and grouping comparator for the composite key should consider only the natural key for partitioning and grouping.
job.setPartitionerClass(FirstPartitioner.class);
job.setSortComparatorClass(KeyComparator.class);
job.setGroupingComparatorClass(GroupComparator.class);


- Join 

  • map side join (strict requirement on splits that same key in splits of different source)
  • reduce side join which is more general
  1. Multiple inputs -> one map task for each source
  2. Secondary sort -> arrange records from different map tasks properly


- side data distribution

  • small data in configuration -> need to be small because,
The job configuration is always read by the client, the application master, and the task JVM, and

each time the configuration is read, all of its entries are read into memory.

  • -files, -archives, -libjars to be copied to node once per job
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章