很久之前的草稿,一直放着沒理,後面忙就沒繼續了,最近被問起來就發出來,寫的不怎麼好。
1.下載hadoop
http://hadoop.apache.org/releases.html下載地址版本自行選擇
也可以在這裏下載http://download.csdn.net/detail/jack5261314/6896011 -->hadoop-1.2.1-bin.tar.gz
2.安裝
*使用hadoop的前提是安裝了sun_jdk,不是open_jdk,openjdk安裝軟件是需要他的支持,但是是對於開發不適合。
2.1)卸載open_jdk
[root@localhost ~]# rpm -qa|grep jdk
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# rpm -qa|grep gcj
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
libgcj-4.1.2-48.el5
[root@localhost ~]# yum -y remove java java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@localhost ~]# yum -y remove java java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# yum -y remove java java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
[root@localhost ~]# yum -y remove libgcj-4.1.2-48.el5
java -version 沒有東西了,Eclipse也沒了。
2.2) 安裝sun_jdk
rpm -ivh jdk-7-linux-x64.rpm 推薦使用rpm包
JDK默認安裝在/usr/java中。
2.3)配置sun_jdk
vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_45
export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
2.4)安裝hadoop
tar -xzf hadoop-1.2.1-bin.tar.gz -C /usr/local 解壓
ln -s /usr/local/hadoop-1.2.1 /opt/hadoop 添加方便使用路徑的軟連接
vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export JAVA_HOME=/usr/java/jdk1.8.0_45
export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/binsource /etc/profile 使配置生效上面基本安裝完成,但是會發現有個warning 不過無關緊要,不妨礙編程。<span style="color:#383838;">[root@kong Desktop]# java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) [root@kong Desktop]# hadoop </span><span style="color:#ff6666;">Warning: $HADOOP_HOME is deprecated.</span><span style="color:#383838;"> Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility oiv apply the offline fsimage viewer to an fsimage fetchdt fetch a delegation token from the NameNode </span>
如果想去掉也可以。vim /etc/profilesource /etc/profileexport HADOOP_HOME=/usr/local/hadoop-1.2.1 export HADOOP_HOME_WARN_SUPPRESS=1 -->解決關鍵 export JAVA_HOME=/usr/java/jdk1.8.0_45 export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin
warning 去掉了[root@kong Desktop]# hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin cli
3.簡單測試
3.1)計算圓周率
hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar pi 4 1000
這個作業是建了4個任務來完成圓周率的計算。簡單的mapreduce 作業。
[root@localhost home]# hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar pi 4 1000 Number of Maps = 4 Samples per Map = 1000 15/05/17 21:49:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Starting Job 15/05/17 21:49:45 INFO mapred.FileInputFormat: Total input paths to process : 4 15/05/17 21:49:46 INFO mapred.JobClient: Running job: job_local630184568_0001 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Waiting for map tasks 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000000_0 15/05/17 21:49:46 INFO util.ProcessTree: setsid exited with exit code 0 15/05/17 21:49:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@45f1cdad 15/05/17 21:49:46 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part0:0+118 15/05/17 21:49:46 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:46 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:46 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:46 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:46 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:46 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:46 INFO mapred.Task: Task:attempt_local630184568_0001_m_000000_0 is done. And is in the process of commiting 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:46 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000000_0' done. 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000000_0 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000001_0 15/05/17 21:49:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1abdca6a 15/05/17 21:49:46 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part1:0+118 15/05/17 21:49:46 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:46 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:47 INFO mapred.JobClient: map 25% reduce 0% 15/05/17 21:49:47 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:47 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:47 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:47 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:47 INFO mapred.Task: Task:attempt_local630184568_0001_m_000001_0 is done. And is in the process of commiting 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:47 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000001_0' done. 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000001_0 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000002_0 15/05/17 21:49:47 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@142af1bd 15/05/17 21:49:47 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part2:0+118 15/05/17 21:49:47 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:47 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:48 INFO mapred.JobClient: map 50% reduce 0% 15/05/17 21:49:49 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:49 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:49 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:49 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:49 INFO mapred.Task: Task:attempt_local630184568_0001_m_000002_0 is done. And is in the process of commiting 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:49 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000002_0' done. 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000002_0 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000003_0 15/05/17 21:49:49 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2ff0a79d 15/05/17 21:49:49 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part3:0+118 15/05/17 21:49:49 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:49 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:50 INFO mapred.JobClient: map 75% reduce 0% 15/05/17 21:49:50 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:50 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:50 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:50 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:50 INFO mapred.Task: Task:attempt_local630184568_0001_m_000003_0 is done. And is in the process of commiting 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:50 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000003_0' done. 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000003_0 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Map task executor complete. 15/05/17 21:49:50 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@47588226 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Merger: Merging 4 sorted segments 15/05/17 21:49:50 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 96 bytes 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Task: Task:attempt_local630184568_0001_r_000000_0 is done. And is in the process of commiting 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Task: Task attempt_local630184568_0001_r_000000_0 is allowed to commit now 15/05/17 21:49:50 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local630184568_0001_r_000000_0' to file:/home/PiEstimator_TMP_3_141592654/out 15/05/17 21:49:50 INFO mapred.LocalJobRunner: reduce > reduce 15/05/17 21:49:50 INFO mapred.Task: Task 'attempt_local630184568_0001_r_000000_0' done. 15/05/17 21:49:51 INFO mapred.JobClient: map 100% reduce 100% 15/05/17 21:49:51 INFO mapred.JobClient: Job complete: job_local630184568_0001 15/05/17 21:49:51 INFO mapred.JobClient: Counters: 21 15/05/17 21:49:51 INFO mapred.JobClient: Map-Reduce Framework 15/05/17 21:49:51 INFO mapred.JobClient: Spilled Records=16 15/05/17 21:49:51 INFO mapred.JobClient: Map output materialized bytes=112 15/05/17 21:49:51 INFO mapred.JobClient: Reduce input records=8 15/05/17 21:49:51 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/05/17 21:49:51 INFO mapred.JobClient: Map input records=4 15/05/17 21:49:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=400 15/05/17 21:49:51 INFO mapred.JobClient: Map output bytes=72 15/05/17 21:49:51 INFO mapred.JobClient: Reduce shuffle bytes=0 15/05/17 21:49:51 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/05/17 21:49:51 INFO mapred.JobClient: Map input bytes=96 15/05/17 21:49:51 INFO mapred.JobClient: Reduce input groups=8 15/05/17 21:49:51 INFO mapred.JobClient: Combine output records=0 15/05/17 21:49:51 INFO mapred.JobClient: Reduce output records=0 15/05/17 21:49:51 INFO mapred.JobClient: Map output records=8 15/05/17 21:49:51 INFO mapred.JobClient: Combine input records=0 15/05/17 21:49:51 INFO mapred.JobClient: CPU time spent (ms)=0 15/05/17 21:49:51 INFO mapred.JobClient: Total committed heap usage (bytes)=1698168832 15/05/17 21:49:51 INFO mapred.JobClient: File Input Format Counters 15/05/17 21:49:51 INFO mapred.JobClient: Bytes Read=520 15/05/17 21:49:51 INFO mapred.JobClient: FileSystemCounters 15/05/17 21:49:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=981494 15/05/17 21:49:51 INFO mapred.JobClient: FILE_BYTES_READ=721813 15/05/17 21:49:51 INFO mapred.JobClient: File Output Format Counters 15/05/17 21:49:51 INFO mapred.JobClient: Bytes Written=109 Job Finished in 5.846 seconds Estimated value of Pi is 3.14000000000000000000
3.2)注意事項運行時要保持網絡的通常,樓主我曾在一個host-only的虛擬網絡中想連接幾臺虛擬機做個分佈式,網絡是不通外網的。結果出錯,所以要保持網絡通暢
centos fedora rehat 一樣可以