1、在系統中安裝JDK,並在 /etc/profile 中添加(具體路徑根據實際安裝地址調整):
export JAVA_HOME=/usr/local/java/jdk1.8.0_201
export CLASS_PATH="$JAVA_HOME/bin:$JAVA_HOME/jre/lib"
export PATH=$PATH:$JAVA_HOME/bin
2、重新加載配置(source /etc/profile)並查看配置結果:
[root@localhost ~]# java -version java version "1.8.0_201" Java(TM) SE Runtime Environment (build 1.8.0_201-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode) |
3、從官方網站(https://hadoop.apache.org/releases.html)下載二進制安裝包:hadoop-2.10.0.tar.gz
4、在CentOS中將其解壓到:/usr/local/hadoop
5、在 /etc/profile 下添加:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6、重新加載配置並查看結果:
[root@localhost ~]# hadoop version Hadoop 2.10.0 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194 Compiled by jhung on 2019-10-22T19:10Z Compiled with protoc 2.5.0 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar |
7、利用官方demo測試hadoop:
[root@localhost hadoop]# mkdir input [root@localhost hadoop]# cp etc/hadoop/*.xml input/ [root@localhost hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar grep input/ output 'dfs[a-z.]+' [root@localhost hadoop]# cd output/ [root@localhost output]# ls part-r-00000 _SUCCESS [root@localhost output]# cat * 1 dfsadmin |
注:該測試執行一次之後如果要再次執行,需要先將output文件夾刪掉
8、編寫自己的測試demo:
測試內容:在一個文件中存儲一組5位數(samples.txt),前三位是key,後兩位是value,找出每個key對應的value的最大值。
11110
11130
22260
22228
22288
(1)創建maven項目,引入需要的jar包
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
(2)定義map函數
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TestMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String line = value.toString();
String testKey = line.substring(0, 3);
Integer testValue = Integer.parseInt(line.substring(3, 5));
context.write(new Text(testKey), new IntWritable(testValue));
}
}
(3)定義reduce函數
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TestReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue,value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
(4)定義main函數
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class TestHadoop {
public static void main(String[] args) throws Exception {
System.out.println(args[0] + "-----" + args[1]);
if (args.length != 2) {
System.out.println("Missing parameters");
System.exit(-1);
}
Job job = Job.getInstance();
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(TestMapper.class);
job.setReducerClass(TestReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.out.println(job.waitForCompletion(true) ? 0 : 1);
}
}
(5)配置maven導出設置
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>cn.study.TestHadoop</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
(6)利用eclipse的run configuration打包,並將打包後的文件修改爲“TestHadoop.jar”
(7)將上述jar文件放在 “/usr/local/hadoop”下,samples.txt文件放在 “/usr/local/hadoop/input”下
(8)執行並查看運行結果(省略具體執行過程):
[root@localhost hadoop]# hadoop jar TestHadoop.jar input/sample.txt output [root@localhost hadoop]# cat output/* |
(9)本篇博客中相關細節問題,可以通過私信或者加我微信(yl4757)諮詢
(10)本次測試的源碼可以在https://download.csdn.net/download/yancie_/11982110下載