hadoop hdfs記錄踩到的坑

研究了幾天終於將hdfs的java api調用搞通了，其中的艱辛一度讓我想要放棄，但最終讓我堅持了下來。這幾天的經驗，無疑是寶貴的，故記錄下來，以防以後遺忘。我用的是版本2.10.0，你要問我爲啥選擇這個版本，我的回答是我也不知道，只知道官網上的下載列表第一個就是它。
1、下載

wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

2、安裝
我是採用Linux系統來安裝的（windows系統試了，沒有跑起來），所謂安裝也就是解壓出來

tar -zxvf hadoop-2.10.0.tar.gz

3、部署
按照官網文檔來就好了：https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
（1）、設置java home，一般都安裝了java的，所以這步可以省略

 # set to the root of your Java installation
  export JAVA_HOME=/usr/java/latest

（2）、試試解壓出來的文件有沒有問題

dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./bin/hadoop

（3）、試試單機運行有沒有問題

dave@ubuntu:~/d/opt/hadoop-2.10.0$ mkdir input
dave@ubuntu:~/d/opt/hadoop-2.10.0$ cp etc/hadoop/*.xml input
dave@ubuntu:~/d/opt/hadoop-2.10.0$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar grep input output 'dfs[a-z.]+'
dave@ubuntu:~/d/opt/hadoop-2.10.0$ cat output/*

（4）、僞分佈式操作

配置：
etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

訪問localhost密鑰設置（這個不設置的話，每次啓動停止都要你輸密碼）：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
dave@ubuntu:~/d/opt/hadoop-2.10.0$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
dave@ubuntu:~/d/opt/hadoop-2.10.0$ chmod 0600 ~/.ssh/authorized_keys

格式化文件系統：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ bin/hdfs namenode -format

啓動namenode、datanode相關守護進程：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ sbin/start-dfs.sh

瀏覽器訪問一下試試：
http://192.168.137.162:50070（官網上是localhost，我在另一臺windows主機上訪問也沒問題，這臺windows也是我後面寫java代碼的主機，所以在這臺機器上試試能不能訪問）
可以看到沒有任何問題：

做一些操作試試（在部署hdfs的主機上操作）：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./bin/hdfs dfs -mkdir /user
dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./bin/hdfs dfs -mkdir /user/dave
dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./bin/hdfs dfs -put etc/hadoop input

通過瀏覽器來看看put上去的文件：

一切都出奇的順利，那我們就來用java api來操作一下。

4、Java api調用
新建一個maven項目，引入兩個依賴：

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.10.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.10.0</version>
        </dependency>

新建一個測試的類：

package com.luoye.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;

public class Application {
    public static void main(String[] args) {
        Configuration configuration=new Configuration();
        configuration.set("fs.defaultFS","hdfs://192.168.137.162:9000");
        try {
            FileSystem fileSystem=FileSystem.newInstance(configuration);

            //上傳文件
            fileSystem.mkdirs(new Path("/dave"));
            fileSystem.copyFromLocalFile(new Path("C:\\Users\\dave\\Desktop\\suwei\\terminal.ini"), new Path("/dave/terminal.ini"));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

注意，配置裏面fs.defaultFS因爲不是本地所以要用部署namenode的主機的IP地址。
一切就緒，那就跑一個看看，應該沒任何問題：

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.net.ConnectException: Call From DESKTOP-JE4MI3O/169.254.47.191 to 192.168.137.162:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1544)
	at org.apache.hadoop.ipc.Client.call(Client.java:1486)
	at org.apache.hadoop.ipc.Client.call(Client.java:1385)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:587)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2475)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2450)
	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1242)
	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1239)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1239)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1231)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2216)
	at com.luoye.hadoop.Application.main(Application.java:17)
Caused by: java.net.ConnectException: Connection refused: no further information
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:701)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:805)
	at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:423)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1601)
	at org.apache.hadoop.ipc.Client.call(Client.java:1432)
	... 24 more

可是現實總是在你以爲一切都ok的時候狠狠給你一棒。
程序報錯了，通過網頁查看，文件也沒有上傳上去。
可是爲啥會出錯呢，我們仔細看看報錯信息：
java.net.ConnectException: Call From DESKTOP-JE4MI3O/169.254.47.191 to 192.168.137.162:9000 failed on connection
連不上！！！
爲啥呢？在部署的主機上操作都沒問題的啊。
難道9000這個端口綁定的地址有問題？
還是到部署主機上去看一下：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ netstat -anp|grep LISTEN

果然：

tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      10958/java

好吧，那看看能不能配置一下，讓它不綁定到127.0.0.1呢，翻翻官網的配置吧：
http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
功夫不負有心人，終於讓我找到一條：

<property>
  <name>dfs.namenode.rpc-bind-host</name>
  <value></value>
  <description>
    The actual address the RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.rpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>

那就在部署主機上的配置文件hdfs-site.xml裏面配置上吧。
然後重啓：

dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./sbin/stop-dfs.sh
dave@ubuntu:~/d/opt/hadoop-2.10.0$ ./sbin/start-dfs.sh

好了，我們再來試試Java api調用：

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /dave/terminal.ini could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2591)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:880)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:517)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2833)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1540)
	at org.apache.hadoop.ipc.Client.call(Client.java:1486)
	at org.apache.hadoop.ipc.Client.call(Client.java:1385)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:448)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1846)
	at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1645)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)

還是報錯！！！！
這是什麼錯啊，完全懵逼啊。
沒辦法網上找找資料吧。
通過網上的一些零碎的資料拼湊起來，大概明白原因是這樣的：客戶端在收到namenode返回的datanode列表時會去確定datanode是不是可用，如果不可用就會把這個datanode從列表中剔除，所以這裏報錯說只有0個datanode可以存副本，意思就是datanode訪問不了。
這就奇怪了，爲啥訪問不了啊。
真是讓人崩潰啊，老天啊，來個人告訴我爲啥吧。
我心裏一遍遍在吶喊着，可是老天爺根本就沒空理我，在網上找了一圈資料，一無所獲。
要是能多輸出點信息就好了，我心想。
這時我注意到開頭幾行，似乎是說log4j配置有問題。好吧，配置一些試試了，於是在resource目錄下配置好一個log4j.xml，再次運行，於是我看到了這樣的輸出：

2020-05-14 02:19:06,573 DEBUG ipc.ProtobufRpcEngine 253 invoke - Call: addBlock took 4ms
2020-05-14 02:19:06,580 DEBUG hdfs.DataStreamer 1686 createBlockOutputStream - pipeline = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-a1ac1217-f93d-486f-bb01-3183e58bad87,DISK]]
2020-05-14 02:19:06,580 DEBUG hdfs.DataStreamer 255 createSocketForPipeline - Connecting to datanode 127.0.0.1:50010
2020-05-14 02:19:07,600 INFO  hdfs.DataStreamer 1763 createBlockOutputStream - Exception in createBlockOutputStream
java.net.ConnectException: Connection refused: no further information
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
	at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1699)
	at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1655)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
2020-05-14 02:19:07,601 WARN  hdfs.DataStreamer 1658 nextBlockOutputStream - Abandoning BP-111504192-127.0.1.1-1589381622124:blk_1073741864_1040

咦，爲啥要去連接127.0.0.1？
一定是缺少什麼配置，好吧，再去翻翻官網上的配置文件吧。
翻來翻去也沒有看到哪個配置是說去設置datanode的訪問地址是部署主機的ip地址的，不過有一個似乎可行的配置：

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>false</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>

客戶端用主機名連接，那就試試這個：
首先在代碼裏面加上：

configuration.set("dfs.client.use.datanode.hostname","true");

然後在寫程序的這臺主機的hosts文件中加上域名和ip地址的映射（部署主機上Hadoop自動加了映射的）：

192.168.137.162 ubuntu

ubuntu是我部署主機的hostname，192.168.137.162是其IP地址。
好了，再試試：

2020-05-14 02:29:04,888 DEBUG hdfs.DataStreamer 873 waitForAckedSeqno - Waiting for ack for: 1
2020-05-14 02:29:04,900 DEBUG ipc.Client 1138 run - IPC Client (428566321) connection to /192.168.137.162:9000 from dave sending #3 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
2020-05-14 02:29:04,903 DEBUG ipc.Client 1192 receiveRpcResponse - IPC Client (428566321) connection to /192.168.137.162:9000 from dave got value #3
2020-05-14 02:29:04,903 DEBUG ipc.ProtobufRpcEngine 253 invoke - Call: addBlock took 3ms
2020-05-14 02:29:04,913 DEBUG hdfs.DataStreamer 1686 createBlockOutputStream - pipeline = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-a1ac1217-f93d-486f-bb01-3183e58bad87,DISK]]
2020-05-14 02:29:04,913 DEBUG hdfs.DataStreamer 255 createSocketForPipeline - Connecting to datanode ubuntu:50010
2020-05-14 02:29:04,915 DEBUG hdfs.DataStreamer 267 createSocketForPipeline - Send buf size 65536
2020-05-14 02:29:04,915 DEBUG sasl.SaslDataTransferClient 239 checkTrustAndSend - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-05-14 02:29:04,916 DEBUG ipc.Client 1138 run - IPC Client (428566321) connection to /192.168.137.162:9000 from dave sending #4 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults
2020-05-14 02:29:04,935 DEBUG ipc.Client 1192 receiveRpcResponse - IPC Client (428566321) connection to /192.168.137.162:9000 from dave got value #4
2020-05-14 02:29:04,936 DEBUG ipc.ProtobufRpcEngine 253 invoke - Call: getServerDefaults took 21ms
2020-05-14 02:29:04,943 DEBUG sasl.SaslDataTransferClient 279 send - SASL client skipping handshake in unsecured configuration for addr = ubuntu/192.168.137.162, datanodeId = DatanodeInfoWithStorage[127.0.0.1:50010,DS-a1ac1217-f93d-486f-bb01-3183e58bad87,DISK]
2020-05-14 02:29:05,057 DEBUG hdfs.DataStreamer 617 initDataStreaming - nodes [DatanodeInfoWithStorage[127.0.0.1:50010,DS-a1ac1217-f93d-486f-bb01-3183e58bad87,DISK]] storageTypes [DISK] storageIDs [DS-a1ac1217-f93d-486f-bb01-3183e58bad87]
2020-05-14 02:29:05,058 DEBUG hdfs.DataStreamer 766 run - DataStreamer block BP-111504192-127.0.1.1-1589381622124:blk_1073741865_1041 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 456
2020-05-14 02:29:05,118 DEBUG hdfs.DataStreamer 1095 run - DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2020-05-14 02:29:05,119 DEBUG hdfs.DataStreamer 766 run - DataStreamer block BP-111504192-127.0.1.1-1589381622124:blk_1073741865_1041 sending packet packet seqno: 1 offsetInBlock: 456 lastPacketInBlock: true lastByteOffsetInBlock: 456
2020-05-14 02:29:05,124 DEBUG hdfs.DataStreamer 1095 run - DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0

哈哈，終於不報錯了，通過瀏覽器查看，文件也有了：

最後，還記錄一點，在部署主機的配置文件etc/hadoop/hdfs-site.xml最好配置一個屬性：

<property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
  <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the mode,
    owner or group of files or directories.
  </description>
</property>

將權限檢查關掉，不然有可能因爲權限問題，不能操作文件。

最後的最後想說，遇到這些問題，主要還是對hdfs瞭解不深，所以還是要加深學習才行。

hadoop hdfs記錄踩到的坑

sudo: java: command not found引發的關於sudo命令執行權限的一點記錄

教她寫代碼的那些日子 1 買書

教她寫代碼的那些日子 3 java語言基礎

MQTT三種等級的服務質量實現

教她寫代碼的那些日子 8 類和對象

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結