WordCounter之hadoop入門-泰克全套Hadoop視頻

一、目標

    在eclipse中搭建hadoop的wordCount,並連接虛擬機上的hadoop環境,實現統計

二、過程

2.1.軟件版本

    jdk 1.8.0_31

   hadoop 2.7.3

2.2eclipse插件安裝(參考https://www.cnblogs.com/zimo-jing/p/8579065.html)

2.3編寫項目中WordCountMapper.java

package com.lizp.test.mapper;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCountMapper {
	
	// 字數統計
	public static class WordCountMap extends Mapper<Object,Text,Text,IntWritable>{
		
		private final IntWritable one = new IntWritable(1);
		private Text word = new Text();
		
	    @Override
	    protected void map(Object key, Text value, Context context)
	    		throws IOException, InterruptedException {
	    	String line = value.toString();
	    	StringTokenizer token = new StringTokenizer(line);
	    	while(token.hasMoreTokens()) {
	    		word.set(token.nextToken());
	    		context.write(word, one);
	    	}
	    }
	    
	}
	
	// 字數累計統計
	public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
		
		private IntWritable result = new IntWritable();
		@Override
		protected void reduce(Text key, Iterable<IntWritable> values,
				Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
			int sum = 0;
			for(IntWritable val: values) {
				sum +=val.get();
			}
			result.set(sum);
			
			context.write(key,result);
			
		}
		
	}
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		conf.set("mapreduce.cluster.local.dir","E:\\hadoop-data\\tmp");
		Job job = Job.getInstance(conf, "lizp-worldcounter");
		job.setJarByClass(WordCountMapper.class);
        job.setMapperClass(WordCountMap.class);  
        job.setCombinerClass(WordCountReduce.class);
        job.setReducerClass(WordCountReduce.class);  
  
	    job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  
  
        job.setInputFormatClass(TextInputFormat.class);  
        job.setOutputFormatClass(TextOutputFormat.class);  
  
        FileInputFormat.addInputPath(job, new Path(args[0]));  
        FileOutputFormat.setOutputPath(job, new Path(args[1]));  
  
        System.exit(job.waitForCompletion(true) ? 0 : 1);
		
	}
	
	
}

2.4上傳測試數據和配置運行參數

   2.4.1上傳a.txt到hdfs://172.16.77.186:9000/input/a.txt

hello hadoop hello ketty hello cat hadoop

  2.4.2設置java運行的參數

program arguments:hdfs://172.16.77.186:9000/input/a.txt hdfs://172.16.77.186:9000/output5

VM arguments:-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/

2.5運行結果

cat	1
hadoop	2
hello	3
ketty	1

三、總結
  3.1遇到的問題

3.1.1window單臺上安裝hadoop配置了cpu、內存參數,否則會導致mapreduce 0%卡主

https://blog.csdn.net/dai451954706/article/details/50464036

3.1.2eclipse插件安裝

https://www.cnblogs.com/supiaopiao/p/7240308.html

3.1.3修改NationIO源碼,避免window檢測權限

https://blog.csdn.net/congcong68/article/details/42043093

3.1.4VM 參數配置

-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/

https://zhidao.baidu.com/question/1382439112860211060.html

3.1.5注意hadoop在windows上運行的臨時目錄的讀取權限

E:\tmp\hadoop-T\mapred\staging\hadoop40875801

3.1.6注意代碼中private final IntWritable one = new IntWritable(1);的初始化

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章