Hadoop安裝（單機版）和測試（含官方測試和自定義測試）

1、在系統中安裝JDK，並在 /etc/profile 中添加（具體路徑根據實際安裝地址調整）：

export JAVA_HOME=/usr/local/java/jdk1.8.0_201
export CLASS_PATH="$JAVA_HOME/bin:$JAVA_HOME/jre/lib"
export PATH=$PATH:$JAVA_HOME/bin

2、重新加載配置（source /etc/profile）並查看配置結果：

[root@localhost ~]# java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

3、從官方網站（https://hadoop.apache.org/releases.html）下載二進制安裝包：hadoop-2.10.0.tar.gz

4、在CentOS中將其解壓到：/usr/local/hadoop

5、在 /etc/profile 下添加：

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

6、重新加載配置並查看結果：

[root@localhost ~]# hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

7、利用官方demo測試hadoop：

[root@localhost hadoop]# mkdir input
[root@localhost hadoop]# cp etc/hadoop/*.xml input/
[root@localhost hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar grep input/ output 'dfs[a-z.]+'
[root@localhost hadoop]# cd output/
[root@localhost output]# ls
part-r-00000 _SUCCESS
[root@localhost output]# cat *
1 dfsadmin

注：該測試執行一次之後如果要再次執行，需要先將output文件夾刪掉

8、編寫自己的測試demo：

測試內容：在一個文件中存儲一組5位數（samples.txt），前三位是key，後兩位是value，找出每個key對應的value的最大值。

（1）創建maven項目，引入需要的jar包

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.2.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>3.2.1</version>
</dependency>
		
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.2.1</version>
</dependency>
		
<dependency>
    <groupId>jdk.tools</groupId>
    <artifactId>jdk.tools</artifactId>
    <version>1.8</version>
    <scope>system</scope>
    <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>

（2）定義map函數

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TestMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		String line = value.toString();
		String testKey = line.substring(0, 3);
		Integer testValue = Integer.parseInt(line.substring(3, 5));
		context.write(new Text(testKey), new IntWritable(testValue));
	}
}

（3）定義reduce函數

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class TestReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue,value.get());
		}
		context.write(key, new IntWritable(maxValue));
	}
}

（4）定義main函數

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class TestHadoop {
	public static void main(String[] args) throws Exception {
		System.out.println(args[0] + "-----" + args[1]);
		if (args.length != 2) {
			System.out.println("Missing parameters");
			System.exit(-1);
		}
		Job job = Job.getInstance();
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		job.setMapperClass(TestMapper.class);
		job.setReducerClass(TestReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		System.out.println(job.waitForCompletion(true) ? 0 : 1);
	}
}

（5）配置maven導出設置

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                   <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>cn.study.TestHadoop</mainClass>
                    </transformer>
                </transformers>
           </configuration>
        </execution>
    </executions>
</plugin>

（6）利用eclipse的run configuration打包，並將打包後的文件修改爲“TestHadoop.jar”

（7）將上述jar文件放在 “/usr/local/hadoop”下，samples.txt文件放在 “/usr/local/hadoop/input”下

（8）執行並查看運行結果（省略具體執行過程）：

[root@localhost hadoop]# hadoop jar TestHadoop.jar input/sample.txt output
......

[root@localhost hadoop]# cat output/*
111 30
222 88

（9）本篇博客中相關細節問題，可以通過私信或者加我微信（yl4757）諮詢

（10）本次測試的源碼可以在https://download.csdn.net/download/yancie_/11982110下載

Hadoop安裝（單機版）和測試（含官方測試和自定義測試）

（四）第三天的筆記

（一）環境快速搭建

（三）第二天的筆記

（二）第一天的筆記

Spring Boot + Mybatis Plus中事務筆記

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結