Hadoop安裝(單機版)和測試(含官方測試和自定義測試)

1、在系統中安裝JDK,並在 /etc/profile 中添加(具體路徑根據實際安裝地址調整):

export JAVA_HOME=/usr/local/java/jdk1.8.0_201
export CLASS_PATH="$JAVA_HOME/bin:$JAVA_HOME/jre/lib"
export PATH=$PATH:$JAVA_HOME/bin

2、重新加載配置(source /etc/profile)並查看配置結果:

[root@localhost ~]# java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

3、從官方網站(https://hadoop.apache.org/releases.html)下載二進制安裝包:hadoop-2.10.0.tar.gz

4、在CentOS中將其解壓到:/usr/local/hadoop

5、在 /etc/profile 下添加:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

6、重新加載配置並查看結果:

[root@localhost ~]# hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

7、利用官方demo測試hadoop:

[root@localhost hadoop]# mkdir input
[root@localhost hadoop]# cp etc/hadoop/*.xml input/
[root@localhost hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar grep input/ output 'dfs[a-z.]+'
[root@localhost hadoop]# cd output/
[root@localhost output]# ls
part-r-00000  _SUCCESS
[root@localhost output]# cat *
1       dfsadmin

注:該測試執行一次之後如果要再次執行,需要先將output文件夾刪掉

8、編寫自己的測試demo:

測試內容:在一個文件中存儲一組5位數(samples.txt),前三位是key,後兩位是value,找出每個key對應的value的最大值。

11110
11130
22260
22228
22288

(1)創建maven項目,引入需要的jar包

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.2.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>3.2.1</version>
</dependency>
		
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.2.1</version>
</dependency>
		
<dependency>
    <groupId>jdk.tools</groupId>
    <artifactId>jdk.tools</artifactId>
    <version>1.8</version>
    <scope>system</scope>
    <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>

(2)定義map函數

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TestMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		String line = value.toString();
		String testKey = line.substring(0, 3);
		Integer testValue = Integer.parseInt(line.substring(3, 5));
		context.write(new Text(testKey), new IntWritable(testValue));
	}
}

(3)定義reduce函數

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class TestReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue,value.get());
		}
		context.write(key, new IntWritable(maxValue));
	}
}

(4)定義main函數

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class TestHadoop {
	public static void main(String[] args) throws Exception {
		System.out.println(args[0] + "-----" + args[1]);
		if (args.length != 2) {
			System.out.println("Missing parameters");
			System.exit(-1);
		}
		Job job = Job.getInstance();
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		job.setMapperClass(TestMapper.class);
		job.setReducerClass(TestReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		System.out.println(job.waitForCompletion(true) ? 0 : 1);
	}
}

(5)配置maven導出設置

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                   <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>cn.study.TestHadoop</mainClass>
                    </transformer>
                </transformers>
           </configuration>
        </execution>
    </executions>
</plugin>

(6)利用eclipse的run configuration打包,並將打包後的文件修改爲“TestHadoop.jar”

(7)將上述jar文件放在 “/usr/local/hadoop”下,samples.txt文件放在 “/usr/local/hadoop/input”下

(8)執行並查看運行結果(省略具體執行過程):

[root@localhost hadoop]# hadoop jar TestHadoop.jar input/sample.txt output
......

[root@localhost hadoop]# cat output/*
111     30
222     88

(9)本篇博客中相關細節問題,可以通過私信或者加我微信(yl4757)諮詢

(10)本次測試的源碼可以在https://download.csdn.net/download/yancie_/11982110下載

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章