windows和cygwin下hadoop安裝配置

windows和cygwin下hadoop安裝配置 

    轉載地址:http://www.zihou.me/html/2010/02/19/1525.html

   在Windows下利用cygwin仿unix環境安裝配置Hadoop。

    子猴也是剛接觸到hadoop,對其的配置第一次按照網上的一些說明配置成功了,但有些東西感到不是很清晰,所以又重新把整個過程跑了一遍並記錄下來,也是想對整個過程有個清晰的脈絡,不正確之處請指教。

1、  所需軟件

1.1、Cygwin(截至到目前最新版本是2.685)

下載地址:http://www.cygwin.com/setup.exe

1.2、JDK 1.6.x

1.3、hadoop-0.20.1

下載地址:http://apache.freelamp.com/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz

2、  安裝

2.1、Cygwin安裝說明見文章:http://www.zihou.me/2010/02/19/1506/

補充:cygwin的bash是無法複製粘貼的,很不方便,所以可採用putty,下載地址是:

http://www.linuxboy.net/linux/rc/puttycyg.zip ,將puttycyg.zip解壓後的三個exe文件放到Cygwin安裝目錄HOME_PATH下bin目錄下,然後修改HOME_PATH下的Cygwin.bat文件,建議用記事本打開,然後將bash –login –i註釋掉,在前面加rem,也就是rem bash –login –i,或者:: bash –login –i,加入 start putty -cygterm – 即可。

這樣一來就可以複製粘貼了,但注意的是默認的根目錄是Cygwin的HOME_PATH,如果要切換到其他主目錄,但如果你想要進入到其他根目錄,但如果你想要進入到其他根目錄,需要通過系統根目錄,子猴這裏的是/cygdrive,比如要進入到e盤,則爲/cygdrive/e。

2.2、JDK的安裝省略了

2.3、hadoop-0.20.1安裝

將hadoop-0.20.1.tar.gz解壓,解壓後的目錄如hadoop-0.20.1,假設是放在E盤:

E:\hadoop-0.20.1,修改conf/hadoop-env.sh文件,將export JAVA_HOME的值修改爲你機上的jdk安裝目錄,比如/cygdrive/d/tools/jdk1.6.0_03,/cygdrive是Cygwin安裝成功後系統的根目錄

3、  安裝和配置ssh

3.1、安裝

在Cygwin的根目錄下分別運行:

$ chmod +r /etc/group

$ chmod +r /etc/passwd

$ chmod +rwx /var

$ ssh-host-config

*** Info: Generating /etc/ssh_host_key

*** Info: Generating /etc/ssh_host_rsa_key

*** Info: Generating /etc/ssh_host_dsa_key

*** Info: Creating default /etc/ssh_config file

*** Info: Creating default /etc/sshd_config file

*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.

*** Info: However, this requires a non-privileged account called 'sshd'.

*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.

*** Query: Should privilege separation be used? (yes/no) yes

*** Info: Note that creating a new user requires that the current account have

*** Info: Administrator privileges.  Should this script attempt to create a

*** Query: new local account 'sshd'? (yes/no) yes

*** Info: Updating /etc/sshd_config file

*** Info: Added ssh to C:\WINDOWS\system32\driversc\services

*** Info: Creating default /etc/inetd.d/sshd-inetd file

*** Info: Updated /etc/inetd.d/sshd-inetd

*** Warning: The following functions require administrator privileges!

*** Query: Do you want to install sshd as a service?

*** Query: (Say "no" if it is already installed as a service) (yes/no) yes

*** Query: Enter the value of CYGWIN for the daemon: [] cygwin注:此處輸入的cygwin可以是任意的)

*** Info: The sshd service has been installed under the LocalSystem

*** Info: account (also known as SYSTEM). To start the service now, call

*** Info: `net start sshd' or `cygrunsrv -S sshd'.  Otherwise, it

*** Info: will start automatically after the next reboot.

*** Info: Host configuration finished. Have fun!

在詢問yes/no的地方,統一輸入yes,sshd就安裝好了。

3.2配置

3.2.1、啓動sshd服務

net start sshd

CYGWIN sshd 服務正在啓動

CYGWIN sshd 服務已經啓動成功

3.2.2、$ ssh localhost

試着連接本機看看,注意,如果在沒有啓動sshd服務,這個連接肯定是失敗的!關於此錯誤也可參見:
http://www.zihou.me/2010/02/19/1521/

如果沒問題,會出現下面一些內容:

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is 08:03:20:43:48:39:29:66:6e:c5:61:ba:77:b2:2f:55.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

zihou@localhost's password:

會提示輸入你機子的登錄密碼,輸入無誤後,會出現文本圖形,類似於歡迎的提示:

The Hippo says: Welcome to

3.2.3、建立ssh的通道

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Your identification has been saved in /home/zihou/.ssh/id_dsa.

Your public key has been saved in /home/zihou/.ssh/id_dsa.pub.

The key fingerprint is:

6d:64:8e:a6:38:73:ab:c5:ce:71:cd:df:a1:ca:63:54 zihou@PC-04101515

The key's randomart image is:

+--[ DSA 1024]----+

|                 |

|                 |

|          o      |

|         *  E    |

|        S +.     |

|     o o +.      |

|    + * ..o   .  |

|     B + .o. o . |

|    ..+  .ooo .  |

+-----------------+

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

再執行遍$ ssh localhost看看,如果沒有問題,就說明sshd已經配置好了。

4、  配置hadoop

編輯conf/hadoop-site.xml

加入以下內容:

<configuration>

<property>

<name>fs.default.name</name>

<value>localhost:9000</value>

</property>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

5、  運行hadoop

進入到E:\hadoop-0.20.1,在cygwin下的操作如:/cygdrive/e/ hadoop-0.20.1,執行:

bin/hadoop namenode –format格式化一個新的分佈式文件系統,提示信息如下:
10/02/19 17:32:26 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath.
Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml
to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
(這段我還不是很清楚,我用的最新版本)

10/02/19 17:32:26 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = PC-04101515/192.168.0.14

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.1

STARTUP_MSG:   build =

http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r 810220; compiled by ‘oom’ on Tue Sep  1 20:55:56 UTC 2009

************************************************************/

10/02/19 17:32:27 INFO namenode.FSNamesystem:

fsOwner=zihou,None,root,Administrators,Users

10/02/19 17:32:27 INFO namenode.FSNamesystem: supergroup=supergroup

10/02/19 17:32:27 INFO namenode.FSNamesystem: isPermissionEnabled=true

10/02/19 17:32:28 INFO common.Storage: Image file of size 102 saved in 0 seconds.

10/02/19 17:32:28 INFO common.Storage: Storage directory \tmp\hadoop-SYSTEM\dfs\name has been successfully formatted.

10/02/19 17:32:28 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at PC-04101515/192.168.0.14

************************************************************/
6、  啓動hadoop守護進程

$ bin/start-all.sh

starting namenode, logging to

/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-namenode-PC-04101515.out

localhost: datanode running as process 5200. Stop it first.

localhost: secondarynamenode running as process 1664. Stop it first.

starting jobtracker, logging to

/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-jobtracker-PC-04101515.out

localhost: starting tasktracker, logging to

/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-tasktracker-PC-04101515.out

(注:如果你第一次啓動,提示信息或許會與上面有所不同,我爲了寫這篇文章,重新執行了一遍)

7、  測試

單機模式的操作方法

下面的實例將已解壓的 conf 目錄拷貝作爲輸入,查找並顯示匹配給定正則表達式的條目。輸出寫入到指定的output目錄。(注:根目錄是hadoop的目錄)

$ mkdir input

$ cp conf/*.xml input

$ bin/hadoop jar hadoop-*-examples.jar grep input output ‘dfs[a-z.]+’

$ cat output/*

通過執行$ bin/hadoop dfs –ls來看是否將*.xml文件拷貝到input中了,執行後結果如下:

Found 1 items

drwxr-xr-x   – zihou supergroup          0 2010-02-19 17:44 /user/zihou/input

表示已經拷貝過去了。

在僞分佈式模式上運行

bin/hadoop jar hadoop-*-examples.jar grep input output ‘dfs[a-z.]+’

如果沒有錯誤的話,會給出一堆信息,如:
10/02/19 14:56:07 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site

.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

10/02/19 14:56:08 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

10/02/19 14:56:09 INFO mapred.FileInputFormat: Total input paths to process : 5

10/02/19 14:56:10 INFO mapred.JobClient: Running job: job_local_0001

10/02/19 14:56:10 INFO mapred.FileInputFormat: Total input paths to process : 5

10/02/19 14:56:10 INFO mapred.MapTask: numReduceTasks: 1

10/02/19 14:56:10 INFO mapred.MapTask: io.sort.mb = 100

10/02/19 14:56:10 INFO mapred.MapTask: data buffer = 79691776/99614720

10/02/19 14:56:10 INFO mapred.MapTask: record buffer = 262144/327680

。。。。。。。。。。。。。。。
這樣,hadoop就成功配置了!

說明

Hadoop中文文檔地址:http://hadoop.apache.org/common/docs/r0.18.2/cn/

快速安裝說明手冊:http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html

Hadoop簡介

Hadoop是一個開放源代碼的分佈式文件系統,屬於Apache中的一個項目,所謂分佈式文件系統(Distributed File System),指的是具有執行遠程文件存取的能力,並以透明方式對分佈在網絡上的文件進行管理和存取,客戶端訪問的時候不需要知道文件真正存放在哪裏。 Hadoop最初是包含在Nutch中的,後來,Nutch中實現的NDFS和MapReduce代碼剝離出來成立了一個新的開源項目,這就是 Hadoop。 


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章