eclipse編譯運行MapReduce程序

原創

2020-05-07 22:41

eclipse編譯運行MapReduce程序

一、環境

Ubuntu16，Hadoop2.7.1

二、安裝eclipse

你可以直接在Ubuntu的Ubuntu軟件中心直接搜索安裝Eclipse，在桌面左側任務欄，不過我用這種方法安裝之後Eclipse打不開，就參照了下面這個網站的安裝步驟：
Eclipse安裝步驟
 jdk安裝可以參考這個網站的安裝JAVA環境

三、安裝配置Hadoop-Eclipse-Plugin

要在 Eclipse 上編譯和運行 MapReduce 程序，需要安裝 hadoop-eclipse-plugin，可下載 Github 上的 hadoop2x-eclipse-plugin（備用下載地址：http://pan.baidu.com/s/1i4ikIoP）。

下載後，將 release 中的 hadoop-eclipse-kepler-plugin-2.6.0.jar （還提供了 2.2.0 和 2.4.1 版本）複製到 Eclipse 安裝目錄的 plugins 文件夾中，運行 eclipse -clean 重啓 Eclipse 即可（添加插件後只需要運行一次該命令，以後按照正常方式啓動就行了）。

unzip -qo ~/下載/hadoop2x-eclipse-plugin-master.zip -d ~/下載    # 解壓到 ~/下載 中
sudo cp ~/下載/hadoop2x-eclipse-plugin-master/release/hadoop-eclipse-plugin-2.6.0.jar /usr/lib/eclipse/plugins/    # 複製到 eclipse 安裝目錄的 plugins 目錄下
/usr/lib/eclipse/eclipse -clean    # 添加插件後需要用這種方式使插件生效

在繼續配置前請確認打開Hadoop

啓動eclipse，選擇Window菜單下的Preferences，此時會彈出一個窗體，窗體的左側會多出 Hadoop Map/Reduce 選項，點擊此選項，選擇 Hadoop 的安裝目錄（如/usr/local/hadoop，Ubuntu不好選擇目錄，直接輸入就行）。

切換 Map/Reduce 開發視圖，選擇 Window 菜單下選擇 Open Perspective -> Other，彈出一個窗體，從中選擇 Map/Reduce 選項即可進行切換。

建立與 Hadoop 集羣的連接，點擊 Eclipse軟件右下角的 Map/Reduce Locations 面板，在面板中單擊右鍵，選擇 New Hadoop Location。

在彈出來的 General 選項面板中，General 的設置要與 Hadoop 的配置一致。一般兩個 Host 值是一樣的，如果是僞分佈式，填寫 localhost 即可，另外我使用的Hadoop僞分佈式配置，設置 fs.defaultFS 爲 hdfs://localhost:9000，則 DFS Master 的 Port 要改爲 9000。Map/Reduce(V2) Master 的 Port 用默認的即可，Location Name 隨意填寫。

Advanced parameters 選項面板是對 Hadoop 參數進行配置，實際上就是填寫 Hadoop 的配置項(/usr/local/hadoop/etc/hadoop中的配置文件)，如我配置了 hadoop.tmp.dir ，就要進行相應的修改。但修改起來會比較繁瑣，我們可以通過複製配置文件的方式解決（下面會說到）。

總之，我們只要配置 General 就行了，點擊 finish，Map/Reduce Location 就創建好了。

四、在Eclipse中創建MapReduce項目

點擊 File 菜單，選擇 New -> Project…，選擇 Map/Reduce Project，點擊 Next。

填寫 Project name 爲 WordCount 即可，點擊 Finish 就創建好了項目。

此時在左側的 Project Explorer 就能看到剛纔建立的項目了。
接着右鍵點擊剛創建的 WordCount 項目，選擇 New -> Class
需要填寫兩個地方：在 Package 處填寫 org.apache.hadoop.examples；在 Name 處填寫 WordCount。
創建 Class 完成後，在 Project 的 src 中就能看到 WordCount.java 這個文件。將如下 WordCount 的代碼複製到該文件中。

package org.apache.hadoop.examples;
 
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
public class WordCount {
    public WordCount() {
    }
 
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
        if(otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }
 
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
 
        for(int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
 
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }
 
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
 
        public IntSumReducer() {
        }
 
        public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int sum = 0;
 
            IntWritable val;
            for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
                val = (IntWritable)i$.next();
            }
 
            this.result.set(sum);
            context.write(key, this.result);
        }
    }
 
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();
 
        public TokenizerMapper() {
        }
 
        public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
 
            while(itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }
 
        }
    }
}

在運行 MapReduce 程序前，還需要執行一項重要操作（也就是上面提到的通過複製配置文件解決參數設置問題）：將 /usr/local/hadoop/etc/hadoop 中將有修改過的配置文件（如僞分佈式需要 core-site.xml 和 hdfs-site.xml），以及 log4j.properties 複製到 WordCount 項目下的 src 文件夾（~/workspace/WordCount/src）中：

cp /usr/local/hadoop/etc/hadoop/core-site.xml ~/workspace/WordCount/src
cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml ~/workspace/WordCount/src
cp /usr/local/hadoop/etc/hadoop/log4j.properties ~/workspace/WordCount/src

複製完成後，務必右鍵點擊 WordCount 選擇 refresh 進行刷新（不會自動刷新，需要手動刷新），可以看到文件結構如下所示：

點擊工具欄中的 Run 圖標，或者右鍵點擊 Project Explorer 中的 WordCount.java，選擇 Run As -> Run on Hadoop，就可以運行 MapReduce 程序了。不過由於沒有指定參數，運行時會提示 “Usage: wordcount “，需要通過Eclipse設定一下運行參數。

右鍵點擊剛創建的 WordCount.java，選擇 Run As -> Run Configurations，在此處可以設置運行時的相關參數（如果 Java Application 下面沒有 WordCount，那麼需要先雙擊 Java Application）。切換到 “Arguments” 欄，在 Program arguments 處填寫 “input output” 就可以了。
注意：這裏需要你的hdfs文件系統（不是本地文件）裏面要有input文件目錄
設定參數後，再次運行程序，可以看到運行成功的提示，刷新 DFS Location 後也能看到輸出的 output 文件夾。

至此，你就可以使用 Eclipse 方便的進行 MapReduce程序的開發了。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

eclipse編譯運行MapReduce程序

eclipse編譯運行MapReduce程序

一、環境

二、安裝eclipse

三、安裝配置Hadoop-Eclipse-Plugin

在繼續配置前請確認打開Hadoop

四、在Eclipse中創建MapReduce項目

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

笑臉數據集、口罩數據集劃分、訓練、測試（jupyter notebook）

Yolo v4 keras識別（Ubuntu18）

Python3.7安裝dlib庫（只需要兩步，簡單有效！！！）

ROS智能車自主導航仿真（melodic）

樹莓派-圖像處理Python

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結