HBase落地到HDFS後磁盤空間急劇增長的解決.

  1. 場景

用flume+hbase thrift朝HBase插入了大約2億行服務器的日誌數據, 在用hbase org.apache.hadoop.hbase.mapreduce.Export 的時候, 發現出現了大量的ScannerTimeoutException,
於是ctrl+c取消了落地到HDFS.
HDFS 一共有 3 個datanode. 每個節點有2T的磁盤空間

$bin/hbase org.apache.hadoop.hbase.mapreduce.Export log.server1 /tom/log.server1

Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 61669ms passed since the last invocation, timeout is currently set to 60000
        at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:205)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
        at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 29, already closed?
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2224)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)

回到hdfs的web UI http://hdfs_address:50070/dfshealth.html#tab-overview, 發現DFS used還在急劇增長, 大約1s/1GB的寫入數據量
登錄到HDFS的namenode, top後發現hdfs和yarn還在佔用大量的CPU資源, iostat後發現磁盤寫入非常大

hadoop fs -du -s -h /tom/log.server1

發現已經佔用超過了1.5T的, 在hdfs的web UI上顯示DFS used佔用超過 3TB, 而且還在增長

通過hbase shell後desc ‘table_name’ 發現表的COMPRESESSION => None 沒有配置.
同時REPLICATION_SCOPE 已經被設置成了0, 查詢hbase配置, 發現dfs.replication設爲3

<property>
    <name>dfs.replication.max</name>
    <value>6</value>
    <source>hdfs-site.xml</source>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
    <source>hdfs-site.xml</source>
</property>

dfs.replication 就是落地到hdfs文件系統的時候, 會做幾個replication, 我這裏有3個datanode, 設成3個基本能滿足需求, 如果你只有3個datanode, 但是指定replication爲4, 實際是不會生效的, 因爲每個datanode只能存放一個replication

因爲我設置的是3, 而且得到落地的實際數據是1.5T, 1.5T * 3 = 4.5 TB, 也就是DFS used還要再寫1.5TB的數據進去.
能不能在寫的同時進行數據壓縮, 這樣就可以降低磁盤佔用, 官方有測試壓縮的結果

        compress, GZIP, LZO, Snappy. (recommend LZO or Snappy)
            Algorithm   % remaining Encoding    Decoding
            GZIP        13.4%       21 MB/s     118 MB/s
            LZO         20.5%       135 MB/s    410 MB/s
            Snappy      22.2%       172 MB/s    409 MB/s

可以看見snappy壓縮度最高,同時解壓速度也不錯,我這裏已經裝了snappy的

暫停log.server1表的

disable 'log.server1'
alter 'log.server1', NAME => 'cf1', COMPRESSION => 'snappy'     #修改壓縮
enable 'log.server1'                                            #enable表後壓縮還不會生效, 需要立即生效
major_compact 'log.server1'                                     #這個命令執行的時間會相當長, 會對整個集羣的CPU, IO有大量的佔用

大約幾個小時後, 發現磁盤佔用,IO, CPU已經降下來了, 每個datanode從1.5TB降低到160GB, 同時HDFS總佔用也降低到了480GBm 看樣子數據已經全部落地, 並且經過了壓縮.

修改hbase-site.xml, 添加如下參數

<property>
    <name>hbase.regionserver.lease.period</name>
    <value>120000</value>
</property>
<property>
    <name>zookeeper.session.timeout</name>
    <value>90000</value>
    <description>ZooKeeper session timeout.</description>
    </property>
<property>
    <name>hbase.regionserver.restart.on.zk.expire</name>
    <value>true</value>
    <description> Zookeeper session expired will force regionserver exit.  Enable this will make the regionserver restart.  </description>
</property>

查看hadoop集羣的備份冗餘情況

hadoop fsck /

Minimally replicated blocks: 7580 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1

可以看見Average block replication 仍是3

需要修改hdfs中文件的備份係數。

hadoop dfs -setrep -w 3 -R /tom/ 就是把目錄下所有文件備份係數設置爲2

sudo -u hdfs hadoop fs -setrep -R 2 /

如果再fsck時候出錯,往往是由於某些文件的備份不正常導致的,可以用hadoop的balancer工具修復
自動負載均衡hadoop文件

hadoop balancer

再次查看各節點的磁盤佔用情況

hadoop dfsadmin -report

onfigured Capacity: 6073208496384 (5.52 TB)
Present Capacity: 5980433230156 (5.44 TB)
DFS Remaining: 5541538604318 (5.04 TB)
DFS Used: 524630220680 (488.60 GB)
DFS Used%: 8.85%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

一切已經恢復到正常.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章