安裝單節點僞分佈式 CDH hadoop 集羣

原來安裝都是三個節點,今天要裝個單節點的,裝完後 MapReduce 總是不能提交到 YARN,折騰了一下午也沒搞定

MR1  中 Job 提交到 JobTracker,在 YARN 中應該提交到 ResourceManager,但發現起了個 LocalJob,經發現做如下配置並不生效

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
其實在 YARN 中沒有下面這個配置了,但經過檢查代碼 JobClient 代碼後還是做了如下配置,值爲 ResourceManager 的地址,

   <property>
        <name>mapred.job.tracker</name>
        <value>com3:8031</value>
   </property>

雖然這次找到了 ResourceManager,但總是碰到 Unknown rpc kind RPC_WRITABLE 的問題

經檢查,服務端,也就是 ResourceManager 端每次使用一種 RPC 類型,都會註冊到一個 Map 變量中,並且只能處理已經註冊過的 RPC 類型,

這裏一共就只有兩種類型: Google 的 protobuf 和 hadoop 的 Writable

public class ProtobufRpcEngine implements RpcEngine {
  public static final Log LOG = LogFactory.getLog(ProtobufRpcEngine.class);
  
  static { // Register the rpcRequest deserializer for WritableRpcEngine 
    org.apache.hadoop.ipc.Server.registerProtocolEngine(
        RPC.RpcKind.RPC_PROTOCOL_BUFFER, RpcRequestWritable.class,
        new Server.ProtoBufRpcInvoker());
  }

但是 服務端只註冊了 protobuf ,因此無法接受客戶端 提交 Job 時使用的 Writable 類型的消息,引起上面錯誤

經檢查客戶端代碼,特爲 客戶端提交Job 時使用的協議 JobSubmissionProtocol 指定使用的 RPC 類型爲 protobuf:

  <property>
   	<name>rpc.engine.org.apache.hadoop.mapred.JobSubmissionProtocol</name>
   	<value>org.apache.hadoop.ipc.ProtobufRpcEngine</value>
   </property>

結果造成如下錯誤:

Exception in thread "main" java.lang.NullPointerException
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:138)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:193)
	at org.apache.hadoop.mapred.$Proxy10.getStagingAreaDir(Unknown Source)
	at org.apache.hadoop.mapred.JobClient.getStagingAreaDir(JobClient.java:1340)
	at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:102)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:954)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
	at mr.ref.WordCount.main(WordCount.java:90)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

其原因是 ResourceManager 接受了這個消息,但在處理過程中對消息內容做了錯誤的假設,結果仍是無法處理

如果不指定客戶端協議類型,則就是這個錯誤:

14/03/31 11:16:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE
	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)


修改客戶端 RPC 類型沒用,那麼修改服務端的,默認值爲

<property>
	<name>yarn.ipc.rpc.class</name>
	<value>org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC</value>
</property>
我嘗試過改爲 org.apache.hadoop.hbase.ipc.WritableRpcEngine 及其他值,都會引起各種各樣的問題,並且服務端協議相關的配置就有好幾個,還得想別的方法

便去檢查 mapreduce.framework.name 爲什麼沒生效,根據這個值找到了一個 JobClient ,其 init 方法同 eclipse 依賴中的 JobClient 不同,這才注意到這段註釋

  /**
   * Connect to the default cluster
   * @param conf the job configuration.
   * @throws IOException
   */
  public void init(JobConf conf) throws IOException {
    setConf(conf);
    cluster = new Cluster(conf);
    clientUgi = UserGroupInformation.getCurrentUser();
  }

這還是 MR1 時代的 JobClient,在 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.5.0.jar 

和  /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.5.0.jar 中都有一個 JobClient ,前者纔是 YARN 時代的 

通過檢查 運行 Job 時的 CLASSPATH 後,修正了 CLASSPATH,修改文件  /usr/lib/hadoop/libexec/hadoop-layout.sh

將 HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"}

改爲 HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-mapreduce"}

就可以了,但此時提交 Job 還是出錯,非常難找,把 YARN 日誌調爲 DEBUG ,客戶端,後臺仍沒有任何 ERROR,這個問題回頭再說


上面其實就是個關於 變量設置的問題,後來安裝僞分佈集羣時才發現,CDH 文檔已經做了說明

如果要將 Job 提交到 YARN ,需設置:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
如果要將 Job 提交到 JobTracker ,需做如下設置,這也是默認值

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce

因爲今天安裝完單點集羣后,仍不能向 YARN 提交 Job,並且問題很難找,DEBUG 日誌中都沒有 ERROR,只有一個 WARN:

提交 Job 後,客戶端控制檯停止輸出,此時的 ResourceManager 日誌, 注意 WARN 和 FAIL

13714 2014-03-31 19:50:50,870 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1396266549856_0001 S      tate change from ACCEPTED to FAILED
13715 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.scheduler.event.AppRemovedSchedulerEvent.EventType: APP_REMOVED
13716 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.rmnode.RMNodeCleanAppEvent.EventType: CLEANUP_APP
13717 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing com2:55147 of type       CLEANUP_APP
13718 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: post-assignContain      ers
13719 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: showRequ      ests: application=application_1396266549856_0001 headRoom=memory: 6144 currentConsumption=0
13720 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: showRequ      ests: application=application_1396266549856_0001 request={Priority: 0, Capability: memory: 2048}
13721 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Node after allocat      ion com2:55147 resource = memory: 8192
13722 2014-03-31 19:50:50,872 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application applicat      ion_1396266549856_0001 requests cleared
13723 2014-03-31 19:50:50,872 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.ser      ver.resourcemanager.RMAppManagerEvent.EventType: APP_COMPLETED
13724 2014-03-31 19:50:50,872 DEBUG org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RMAppManager processing event for       application_1396266549856_0001 of type APP_COMPLETED
************************************
13725 2014-03-31 19:50:50,872 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     OPERATION=Applicatio      n Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Appl      ication application_1396266549856_0001 failed 1 times due to AM Container for appattempt_1396266549856_0001_000001 exited wi      th  exitCode: 1 due to:
13726 .Failing this attempt.. Failing the application.        APPID=application_1396266549856_0001
13727 2014-03-31 19:50:50,876 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=applicatio      n_1396266549856_0001,name=word count,user=root,queue=default,state=FAILED,trackingUrl=com2:8088/proxy/application_1396266549      856_0001/,appMasterHost=N/A,startTime=1396266647295,finishTime=1396266650870
************************************
13728 2014-03-31 19:50:51,519 DEBUG org.apache.hadoop.ipc.Server:  got #55

同一時間的 NodeManager 日誌:

2014-03-31 19:50:50,528 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE    APPID=application_1396266549856_0001    CONTAINERID=container_1396266549856_0001_01_000001
2014-03-31 19:50:50,280 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /opt/data/hadoop/hadoop-yarn/nm-local-dir/usercache/root/appcache/application_1396266549856_0001/container_1396266549856_0001_01_000001/default_container_executor.sh]
2014-03-31 19:50:50,493 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from task is : 1
2014-03-31 19:50:50,493 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2014-03-31 19:50:50,494 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1396266549856_0001_01_000001 of type UPDATE_DIAGNOSTICS_MSG
2014-03-31 19:50:50,494 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1396266549856_0001_01_000001 completed with exit code 1
2014-03-31 19:50:50,495 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2014-03-31 19:50:50,495 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType: CONTAINER_EXITED_WITH_FAILURE
2014-03-31 19:50:50,495 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1396266549856_0001_01_000001 of type CONTAINER_EXITED_WITH_FAILURE
2014-03-31 19:50:50,496 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1396266549856_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2014-03-31 19:50:50,496 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER
2014-03-31 19:50:50,496 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1396266549856_0001_01_000001

暫不在這浪費時間了,因爲才發現 CDH 已經提供了現成配置 hadoop-conf-pseudo.x86_64 ,直接安裝就 OK,簡述安裝步驟:

準備 CDH  倉庫配置

[cloudera-cdh4.2.1]
name=Cloudera's Distribution for Hadoop, Version 4.2.1
baseurl=http://archive-primary.cloudera.com/cdh4/redhat/6/x86_64/cdh/4.2.1/
gpgkey =  http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1
物理機爲 Fedora ,需要安裝:

yum -y install createrepo yum-utils

複製倉庫到本地,創建本地倉庫
mkdir -p /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS
reposync -p /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS --repoid=cloudera-cdh4.2.1
createrepo -o /var/www/cloudera-cdh4/cdh4/4.2.1 /var/www/cloudera-cdh4/cdh4/4.2.1/RPMS

將倉庫配置爲 Web 訪問目錄,供虛擬機安裝,我的 Apache 版本爲 2.4.7,RHEL6.X 中的版本可能比較低,權限配置略有差別

在 CentOS 6.4 自帶的 2.2.15 中,需要將 Require all granted 這行去掉

# cat /etc/httpd/conf.d/cloudera.conf
 
NameVirtualHost 192.168.3.1:80
<VirtualHost 192.168.3.1:80>
    DocumentRoot /var/www/cloudera-cdh4
    ServerName 192.168.3.1
    <Directory />
    Options All
    AllowOverride All
    Require all granted
    </Directory>
</VirtualHost>
安裝 hadoop 的主機中還要安裝一個包,在 IOS 文件中就有

yum -y install nc
啓動 httpd,在 虛擬機中配置 yum 

# cat /etc/yum.repos.d/cloudera-cdh4.2.1.repo 
[cloudera-cdh4.2.1]
name=cdh4.2.1
baseurl=http://192.168.3.1/cdh4/4.2.1/
gpgcheck = 0
enable=1
此時便可安裝 hadoop

yum -y install hadoop.x86_64 hadoop-hdfs-namenode.x86_64 hadoop-hdfs-datanode.x86_64 
yum -y install hadoop-client.x86_64 hadoop-mapreduce.x86_64 hadoop-conf-pseudo.x86_64
yum -y install hadoop-yarn-resourcemanager.x86_64 hadoop-yarn-nodemanager.x86_64
現成的 conf.pseudo ,很貼心
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.pseudo/ 30
alternatives --set hadoop-conf /etc/hadoop/conf.pseudo/
格式化,啓動 HDFS
sudo -u hdfs hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-hdfs-datanode start
創建工作目錄

sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
啓動 YARN

/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-nodemanager start
創建用戶目錄

sudo -u hdfs hadoop fs -mkdir /user/hdfs
sudo -u hdfs hadoop fs -chown hdfs /user/hdfs
sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root /user/root
sudo -u hdfs hadoop fs -mkdir /user/mapred
sudo -u hdfs hadoop fs -chown mapred /user/mapred
sudo -u hdfs hadoop fs -mkdir /user/yarn
sudo -u hdfs hadoop fs -chown yarn /user/yarn

安裝完成

下面使用一個測試用戶,測試提交 Job,跟文檔一樣,就叫 joe

[root@com2 mr]# useradd joe
[root@com2 mr]# passwd joe
 
[root@com2 mr]# su joe
[joe@com2 mr]$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
[joe@com2 mr]# sudo -u hdfs hadoop fs -mkdir /user/joe
[joe@com2 mr]# sudo -u hdfs hadoop fs -chown joe /user/joe
 
[joe@com2 mr]$ hadoop fs -mkdir input
[joe@com2 mr]$ hadoop fs -put /etc/hadoop/conf/*.xml input
[joe@com2 mr]$ hadoop fs -ls input
Found 4 items
-rw-r--r--   1 joe supergroup       1461 2014-03-31 21:35 input/core-site.xml
-rw-r--r--   1 joe supergroup       1854 2014-03-31 21:35 input/hdfs-site.xml
-rw-r--r--   1 joe supergroup       1325 2014-03-31 21:35 input/mapred-site.xml
-rw-r--r--   1 joe supergroup       2262 2014-03-31 21:35 input/yarn-site.xml
運行 MapReduce ,查看結果
[joe@com2 mr]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
[joe@com2 mr]$ hadoop fs -ls output23
Found 2 items
-rw-r--r--   1 joe supergroup          0 2014-03-31 21:37 output23/_SUCCESS
-rw-r--r--   1 joe supergroup        150 2014-03-31 21:37 output23/part-r-00000
[joe@com2 mr]$
[joe@com2 mr]$ hadoop fs -cat output23/part-r-00000 | head
1   dfs.safemode.min.datanodes
1   dfs.safemode.extension
1   dfs.replication
1   dfs.namenode.name.dir
1   dfs.namenode.checkpoint.dir
1   dfs.datanode.data.dir



參考:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_3.html


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章