第一次WordCount小遊戲
在idea客戶端上面進行WordCount統計
1:創建mapper類繼承mapper(選hadoop類型)
public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
}
1.2重寫map方法 ctrl+o 選map
public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
}
}
1.3 mapper步驟
public class wordcountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(" ");
for (String word : words) {
if (word.equals("")){
continue;
}
context.write(new Text(word),new IntWritable(1));
}
}
}
2:創建reduce類繼承reduce(選hadoop類型)
public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
}
2.2重寫reduce方法 ctrl+o 選reduce
public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
}
}
2.3reduce 步驟
public class wordcountReduce extends Reducer<Text, IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(key, new IntWritable(count));
}
}
3:創建主類 driver
public class wordcountDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(wordcountDriver.class);
job.setMapperClass(wordcountMapper.class);
job.setReducerClass(wordcountReduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, "hdfs://192.168.100.100:8020/hello/mapreduce/test.txt");
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.100.100:8020/hello/mapreduce/wordcountout/"));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
3.1 檢查test.txt(未處理)
運行driver
運行結果
web查看並下載結果(ip:50070)
打開查看
在linux集羣上面進行WordCount統計
1:修改driver裏面的輸入數據和輸出數據路徑
public class wordcountDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(wordcountDriver.class);
job.setMapperClass(wordcountMapper.class);
job.setReducerClass(wordcountReduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
2:idea打jar包(跳過打jar的流程)
3:把jar包導入到Linux(導入的方式也有 N種)
4:把處理前的text文件傳入hdfs
5: 通過 hadoop jar *******.jar /處理前文件路徑input/text.txt /輸出結果路徑output
6:運行結果
7:web下載處理後文件
8:查看處理後文件