hadoop-環境搭建

原創

2020-05-17 20:40

預置操作

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre

Standalone

此模式下，只允許操作本地文件系統

// core-default.xml
<property>
  <name>fs.defaultFS</name>
  <value>file:///</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

操作示例

$ bin/hdfs dfs -ls .
Found 7 items
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 bin
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 etc
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 include
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 lib
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 libexec
drwxr-xr-x   - 501 wheel       4096 2020-05-17 00:53 sbin
drwxr-xr-x   - 501 wheel       4096 2014-08-06 17:46 share

$ mkdir input
$ cp etc/hadoop/*.xml input
# 運行前需要output目錄不存在
# hadoop-mapreduce-examples的用法：
# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Pseudo-Distributed

hdfs 環境準備

etc/hadoop/core-site.xml

<!--默認-->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
<!--改成可以持久化的目錄-->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/root/hadoop/tmp</value>
</property>

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
</property>

etc/hadoop/hdfs-site.xml

<!--默認-->
<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<!--需要改成1， 因爲只有一個數據節點-->
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

配置ssh免密訪問，已配置，可路過

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

#格式化namenode
$ bin/hdfs namenode -format
#啓動hdfs
$ sbin/start-dfs.sh

hdfs環境搭建完成
測試

$ bin/hdfs dfs -put input /input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep /input /output 'dfs[a-z.]+'

#查看測試結果
$ bin/hdfs dfs -ls -R /output
-rw-r--r--   1 root supergroup          0 2020-05-17 01:50 /output/_SUCCESS
-rw-r--r--   1 root supergroup         11 2020-05-17 01:50 /output/part-r-00000
# 重複操作記得清理輸出文件

yarn 環境準備

etc/hadoop/mapred-site.xml

<!--default-->
<property>
  <name>mapreduce.framework.name</name>
  <value>local</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>
<!--change to yarn-->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

etc/hadoop/yarn-site.xml

<!--default--->
  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value></value>
    <!--<value>mapreduce_shuffle</value>-->
  </property>
<!---NodeManager執行MR任務的方式Shuffle-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

yarn環境配置完成
測試

sbin/start-yarn.sh
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep /input /output 'dfs[a-z.]+'

jobhistory

etc/hadoop/yarn-site.xml

<!--允許yarn日誌聚合-->
  <property>
    <description>Whether to enable log aggregation. Log aggregation collects
      each container's logs and moves these logs onto a file-system, for e.g.
      HDFS, after the application completes. Users can configure the
      "yarn.nodemanager.remote-app-log-dir" and
      "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
      where these logs are moved to. Users can access the logs via the
      Application Timeline Server.
    </description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

啓動測試

sbin/stop-yarn.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

再啓動測試，就可以歷史信息了
http://single-node:8088/cluster

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hadoop-環境搭建

預置操作

Standalone

Pseudo-Distributed

hdfs 環境準備

yarn 環境準備

jobhistory

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

hadoop-namenode-啓動流程

hadoop-環境搭建

fastdfs-開發測試環境搭建

elasticsearch-開發環境搭建

SpringCloud-Feign-配置機制

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結