HDFS針對多硬盤節點的存儲策略

http://hi.baidu.com/thinkdifferent/blog/item/95de0e2416c4da3fc89559b8.html

 

對於HDFS針對多硬盤節點的存儲策略,一直沒有找到比較確實的依據,只有Hadoop官網上說過一句nodes with multiple disks should be managed internally(大致如此,懶得再看了)。今天看到一篇博客,直接把代碼片段給貼上去了。現轉貼如下:

from: kzk's blog

To use multiple disks in Hadoop DataNode, you should add comma-separated directories to dfs.data.dir in hdfs-site.xml. The following is an example of using four disks.

  1. <property>  
  2.   <name>dfs.data.dir</name>  
  3.   <value>/disk1, /disk2, /disk3, /disk4</value>  
  4. </property>  

But how to use these disks in Hadoop? I found the following code snippet in ./hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java at hadoop-0.20.1.

  1. synchronized  FSVolume getNextVolume( long  blockSize)  throws  IOException {  
  2.   int  startVolume = curVolume;  
  3.   while  ( true ) {  
  4.     FSVolume volume = volumes[curVolume];  
  5.     curVolume = (curVolume + 1 ) % volumes.length;  
  6.     if  (volume.getAvailable() > blockSize) {  return  volume; }   
  7.     if  (curVolume == startVolume) {  
  8.       throw   new  DiskOutOfSpaceException( "Insufficient space for an additional block" );  
  9.     }  
  10.   }  
  11. }  
synchronized FSVolume getNextVolume(long blockSize) throws IOException {       int startVolume = curVolume;       while (true) {         FSVolume volume = volumes[curVolume];         curVolume = (curVolume + 1) % volumes.length;         if (volume.getAvailable() > blockSize) { return volume; }         if (curVolume == startVolume) {           throw new DiskOutOfSpaceException("Insufficient space for an additional block");         }       }     }

FSVolume represents the single directory specified at dfs.data.dir. This code places the blocks in round-robin fashion into multiple disks, while considering the available disk capacities.

One more thing. If the disk utilization reaches the 100%, the other important data (c,f. error log) cannot be written. To prevent this, Hadoop prepares the "dfs.datanode.du.reserved" value. When calculating the disk capacity in Hadoop, this value is always subtracted from the real capacity. Setting this value as severay hundreds of megabytes would be safe.

This is the default strategy of Hadoop, but I think considering the disk load avg would be better. If one disk is busy, Hadoop should avoid to use that disk. However, the block distribution would not be same across the disks in this method. Therefore, the read performance will drop. This is a very difficult problem. Do you come up with a better strategy?

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章