windows搭建cygwin、hadoop以及和eclipse集成

整個過程參考了以下文章:

1、http://cw550284.iteye.com/blog/1064844

2、http://lirenjuan.iteye.com/blog/1280729

大家也知道map reduce程序調試是一個很困難的事情,還好有cygwin這個好用的工具,以及eclipse相應的插件,真是幫了我們大忙啦!嗯,下面我總結一下我的安裝和配置過程:

一、cygwin的安裝

這個沒有什麼好說的了,從cygwin的官網上下載安裝文件,在線安裝即可,下載地址:http://cygwin.com/install.html

cygwin安裝完成後,配置環境變量:CYGWIN_HOME,並將%CYGWIN_HOME%\bin配置在PATH中。

二、hadoop的安裝和配置

首先要建立ssh無密碼訪問,這個步驟在上面的第一篇文章中有說明,這裏就說說我在建立ssh授權的是時候遇到的問題吧!

在建立是時候無密碼訪問時可能會出現錯誤,但是無法定位到問題所在,可以打開ssh的debug模式:
ssh -vv localhost
問題1:

ssh服務沒有啓動
解決辦法:重啓sshd服務 cygrunsrv -S sshd


問題2:

ssh: Permission denied
Problem: you can't login to your account. You set the password using `passwd`, but it still gives you this error.
Solution: The problem is that sometime Cygwin does not create your local user in the /etc/passwd file. The solution is simple:
mkpasswd.exe -c > /etc/passwd
Now you should see your Windows user in the passwd file. Now use the `passwd` command to give yourself a password for the Cygwin user. This is not the same as the Windows user.

解決辦法:mkpasswd.exe -c > /etc/passwd生成密碼文件

ssh無密碼訪問建立完成,接下來安裝hadoop,具體步驟和linux下一樣:

1、修改hadoop-env.sh的配置;

2、修改core-site.xml的配置;

3、修改mapred-site.xml的配置;

4、配置環境變量:HADOOP_HOME,並將%HADOOP_HOME%\bin配置在PATH中;

5、啓動hadoop,並驗證。

三、eclipse集成

首先下載並安裝eclipse的插件:hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar

如果用eclipse的話,按照以下步驟進行安裝:

安裝hadoop-eclipse-plugin
a、在eclipse的安裝目錄下新建文件夾:links
b、新建鏈接文件,hadoop.link ,內容爲:path=E:\\eclipsePlugins\\hadoop
c、在path目錄下新建文件夾:plugins,並把hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar放在該目錄下,即:E:\eclipsePlugins\hadoop\plugins,hadoop-0.20.2的插件一定要用這個,如果用本身自帶的0.20.2插件的話,eclipse調試時無法彈出Run on Hadoop
d、刪除E:\Program Files\eclipse\configuration下的org.eclipse.update文件夾

筆者使用的是sprinsource發佈的eclipse工具,安裝hadoop-eclipse-plugin插件比較簡單,直接將hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar放在sts-2.3.2.RELEASE\plugins下,重新啓動eclipse就好了。

一切都部署完成,運行一個map reduce任務試試吧!這期間我遇到過以下問題:

問題1:

執行map任務時出現:
12/03/07 14:56:13 INFO mapred.JobClient: Task Id : attempt_201203071039_0011_m_000001_2, Status : FAILED
java.io.FileNotFoundException: File E:/cygdrive/e/data/tmp/mapred/local/taskTracker/jobcache/job_201203071039_0011/attempt_201203071039_0011_m_000001_2/work/tmp does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)
解決辦法:map過程中需要一些臨時文件來存放map的結果。錯誤的原因在於找不到該臨時文件。將mapred-site.xml配置文件中的配置mapred.child.tmp改爲一個絕對路徑,如下:
<property>
  <name>mapred.child.tmp</name>
  <value>E:\Apache\Hadoop\Run\tmp</value>
  <description> To set the value of tmp directory for map and reduce tasks.
  If the value is an absolute path, it is directly assigned. Otherwise, it is
  prepended with task's working directory. The java tasks are executed with
  option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
  streaming are set with environment variable,
   TMPDIR='the absolute path of the tmp dir'
  </description>
</property>
問題2:

執行map reduce任務時出現:
java.lang.IllegalArgumentException: Can't read partitions file
       at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
       at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
       at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
       at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
       at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
       at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
       at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
       at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
       at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
解決辦法:請參考http://hbase.apache.org/book/trouble.mapreduce.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章