源碼解讀--(2)hbase-examples BufferedMutator Example

源碼解讀--(1)hbase客戶端源代碼 http://aperise.iteye.com/blog/2372350
源碼解讀--(2)hbase-examples BufferedMutator Example http://aperise.iteye.com/blog/2372505
源碼解讀--(3)hbase-examples MultiThreadedClientExample http://aperise.iteye.com/blog/2372534

1.摒棄HTable,直接創建HTable裏的BufferedMutator對象操作hbase客戶端完全可行

    在前面的hbase客戶端源代碼分析中,我們客戶端的創建方式如下:

//默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation  
Connection connection = ConnectionFactory.createConnection(configuration);      
//默認table實現是org.apache.hadoop.hbase.client.HTable  
Table table = connection.getTable(TableName.valueOf("tableName")); 
  1. 默認我們拿到了connection的實現org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation,裏面我們需要注意的是通過setupRegistry()類設置了與zookeeper交互的重要類org.apache.hadoop.hbase.client.ZookeeperRegistry類,後續與zookeeper交互都由此類完成
  2. 然後通過connection拿到了table的實現org.apache.hadoop.hbase.client.HTable
  3. 最後發現org.apache.hadoop.hbase.client.HTable歸根結底持有的就是BufferedMutatorImpl類型的屬性mutator,所有後續的操作都是基於mutator操作

    那麼其實我們操作hbase客戶端,完全可以摒棄HTable對象,直接構建BufferedMutator,然後操作hbase,正如所料在hbase的源碼模塊hbase-examples裏也正好提到了這種使用方法,使用的關鍵代碼如下:

Configuration configuration = HBaseConfiguration.create();      
configuration.set("hbase.zookeeper.property.clientPort", "2181");      
configuration.set("hbase.client.write.buffer", "2097152");      
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");

BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf("tableName"));

//3177不是我杜撰的,是2*hbase.client.write.buffer/put.heapSize()計算出來的   
int bestBathPutSize = 3177;   

//這裏利用jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {}
//相當於調用了finally功能,調用(必須實現java.io.Closeable的對象)的close()方法,也即會調用conn.close(),mutator.close()
try(
  //默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation 
  Connection conn = ConnectionFactory.createConnection(configuration);
  //默認mutator實現是org.apache.hadoop.hbase.client.BufferedMutatorImpl
  BufferedMutator mutator = conn.getBufferedMutator(params);
){         
  List<Put> putLists = new ArrayList<Put>();    
  for(int count=0;count<100000;count++){    
    Put put = new Put(rowkey.getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());    
    put.setDurability(Durability.SKIP_WAL);  
    putLists.add(put);    
        
    if(putLists.size()==bestBathPutSize){    
      //達到最佳大小值了,馬上提交一把    
        mutator.mutate(putLists);   
        mutator.flush();
        putLists.clear();
    }    
  }    
  //剩下的未提交數據,最後做一次提交       
  mutator.mutate(putLists);   
  mutator.flush();
}catch(IOException e) {
  LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
} 

 

2.BufferedMutatorParams

    BufferedMutatorParams主要是收集構造BufferedMutator對象的參數信息,這些參數包括hbase數據表名、hbase客戶端緩衝區、hbase rowkey最大所佔空間、線程池和監聽hbase操作的回調監聽器(比如監聽hbase寫入失敗)

package org.apache.hadoop.hbase.client;

import java.util.concurrent.ExecutorService;

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;

/**
 * 構造BufferedMutator對象的類BufferedMutatorParams
 */
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class BufferedMutatorParams {

  static final int UNSET = -1;

  private final TableName tableName;//hbase數據表
  private long writeBufferSize = UNSET;//hbase客戶端緩衝區
  private int maxKeyValueSize = UNSET;//hbase rowkey最大所佔空間
  private ExecutorService pool = null;//線程池
  private BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {//監聽hbase操作的回調監聽器,比如監聽hbase寫入失敗
    @Override
    public void onException(RetriesExhaustedWithDetailsException exception,
        BufferedMutator bufferedMutator)
        throws RetriesExhaustedWithDetailsException {
      throw exception;
    }
  };

  public BufferedMutatorParams(TableName tableName) {//構造方法
    this.tableName = tableName;
  }

  public TableName getTableName() {//獲取表名
    return tableName;
  }

  public long getWriteBufferSize() {//獲取寫緩衝區大小
    return writeBufferSize;
  }

  /**
   * 重寫緩衝區設置函數
   */
  public BufferedMutatorParams writeBufferSize(long writeBufferSize) {
    this.writeBufferSize = writeBufferSize;
    return this;
  }

  public int getMaxKeyValueSize() {//獲取rowkey所佔空間
    return maxKeyValueSize;
  }

  /**
   * 重寫設置rowkey所佔空間的函數
   */
  public BufferedMutatorParams maxKeyValueSize(int maxKeyValueSize) {
    this.maxKeyValueSize = maxKeyValueSize;
    return this;
  }

  public ExecutorService getPool() {//獲取線程池
    return pool;
  }
  
  public BufferedMutatorParams pool(ExecutorService pool) {//構造函數
    this.pool = pool;
    return this;
  }

  public BufferedMutator.ExceptionListener getListener() {//獲取監聽器
    return listener;
  }
  
  public BufferedMutatorParams listener(BufferedMutator.ExceptionListener listener) {//構造函數
    this.listener = listener;
    return this;
  }
}

3.BufferedMutator

    BufferedMutator是一個接口,主要定義了一些抽象方法:

public interface BufferedMutator extends Closeable {
  TableName getName();//獲取表名
  Configuration getConfiguration();//獲取hadoop配置對象Configuration
  void mutate(Mutation mutation) throws IOException;//操作緩衝區
  void mutate(List<? extends Mutation> mutations) throws IOException;//批量操作緩衝區
  @Override
  void close() throws IOException;//實現Closeable接口,這樣可以利用JDK1.7新特性不寫finally就可以關閉對象
  void flush() throws IOException;//想hbase服務端提交數據請求
  long getWriteBufferSize();//獲取寫緩衝區大小
  @InterfaceAudience.Public
  @InterfaceStability.Evolving
  interface ExceptionListener {//監聽器
    public void onException(RetriesExhaustedWithDetailsException exception,
        BufferedMutator mutator) throws RetriesExhaustedWithDetailsException;
  }
}

4.BufferedMutatorImpl

package org.apache.hadoop.hbase.client;

import com.google.common.annotations.VisibleForTesting;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
import org.apache.hadoop.hbase.ipc.RpcControllerFactory;

import java.io.IOException;
import java.io.InterruptedIOException;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

/**
 * hbase1.0.0纔開始使用BufferedMutatorImpl
 * 主要用於在多線程中操作同一個數據表
 * 需要注意的是多線程中共享一個BufferedMutator對象,如果某個線程中出錯,其他線程也會出錯
 */
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class BufferedMutatorImpl implements BufferedMutator {

  private static final Log LOG = LogFactory.getLog(BufferedMutatorImpl.class);
  
  private final ExceptionListener listener;//hbase客戶端每次操作的監聽回調對象

  protected ClusterConnection connection; //持有的鏈接
  private final TableName tableName;//hbase數據表
  private volatile Configuration conf;//hadoop配置類Configuration
  @VisibleForTesting
  final ConcurrentLinkedQueue<Mutation> writeAsyncBuffer = new ConcurrentLinkedQueue<Mutation>();//hbase緩衝區隊列
  @VisibleForTesting
  AtomicLong currentWriteBufferSize = new AtomicLong(0);//線程安全的長整型值,主要累計當前在緩衝區中數據所佔空間大小

  private long writeBufferSize;//hbase客戶端緩衝區大小
  private final int maxKeyValueSize;//hbase客戶端rowkey所佔最大空間
  private boolean closed = false;//hbase客戶端是否已經被關閉
  private final ExecutorService pool;//hbase客戶端使用的線程池

  @VisibleForTesting
  protected AsyncProcess ap; //hbase客戶端異步操作對象

  BufferedMutatorImpl(ClusterConnection conn, RpcRetryingCallerFactory rpcCallerFactory,
      RpcControllerFactory rpcFactory, BufferedMutatorParams params) {
    if (conn == null || conn.isClosed()) {
      throw new IllegalArgumentException("Connection is null or closed.");
    }

    this.tableName = params.getTableName();
    this.connection = conn;
    this.conf = connection.getConfiguration();
    this.pool = params.getPool();
    this.listener = params.getListener();

    //基於傳入的conf構建自己的屬性ConnectionConfiguration,客戶端沒有設置的配置會自動使用默認值
    ConnectionConfiguration tableConf = new ConnectionConfiguration(conf);
    //設置緩衝區大小
    this.writeBufferSize = params.getWriteBufferSize() != BufferedMutatorParams.UNSET ? params.getWriteBufferSize() : tableConf.getWriteBufferSize();
    //設置rowkey最大所佔空間
    this.maxKeyValueSize = params.getMaxKeyValueSize() != BufferedMutatorParams.UNSET ? params.getMaxKeyValueSize() : tableConf.getMaxKeyValueSize();

    //hbase客戶端異步操作對象
    ap = new AsyncProcess(connection, conf, pool, rpcCallerFactory, true, rpcFactory);
  }

  @Override
  public TableName getName() {//獲取表名
    return tableName;
  }

  @Override
  public Configuration getConfiguration() {//獲取hadoop配置對象Configuration,這裏是客戶端傳入的conf
    return conf;
  }

  @Override
  public void mutate(Mutation m) throws InterruptedIOException,
      RetriesExhaustedWithDetailsException {//操作緩衝區
    mutate(Arrays.asList(m));
  }

  @Override
  public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {  
    //如果BufferedMutatorImpl已經關閉,直接退出返回  
    if (closed) {  
      throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");  
    }  
  
    //這裏先不斷循環累計提交的List<Put>記錄所佔的空間,放置到toAddSize  
    long toAddSize = 0;  
    for (Mutation m : ms) {  
      if (m instanceof Put) {  
        validatePut((Put) m);  
      }  
      toAddSize += m.heapSize();  
    }  
  
    // This behavior is highly non-intuitive... it does not protect us against  
    // 94-incompatible behavior, which is a timing issue because hasError, the below code  
    // and setter of hasError are not synchronized. Perhaps it should be removed.  
    if (ap.hasError()) {  
      //設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize  
      currentWriteBufferSize.addAndGet(toAddSize);  
      //把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer,在爲提交完成前先不進行清理  
      writeAsyncBuffer.addAll(ms);  
      //這裏當捕獲到異常時候,再進行異常前的一次數據提交  
      backgroundFlushCommits(true);  
    } else {  
      //設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize  
      currentWriteBufferSize.addAndGet(toAddSize);  
      //把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer,在爲提交完成前先不進行清理  
      writeAsyncBuffer.addAll(ms);  
    }  
  
    // Now try and queue what needs to be queued.  
    // 如果當前提交的List<Put>記錄所佔空間大於hbase.client.write.buffer設置的值,默認2MB,那麼就馬上調用backgroundFlushCommits方法  
    // 如果小於hbase.client.write.buffer設置的值,那麼就直接退出,啥也不做  
    while (currentWriteBufferSize.get() > writeBufferSize) {  
      backgroundFlushCommits(false);  
    }  
  }  

  // 校驗Put
  public void validatePut(final Put put) throws IllegalArgumentException {
    HTable.validatePut(put, maxKeyValueSize);
  }

  @Override
  public synchronized void close() throws IOException {
    try {
      if (this.closed) {//如果已經關閉了,直接返回
        return;
      }
      
      //關閉前做最後一次提交
      backgroundFlushCommits(true);
      this.pool.shutdown();//關閉線程池
      boolean terminated;
      int loopCnt = 0;
      do {
        // wait until the pool has terminated
        terminated = this.pool.awaitTermination(60, TimeUnit.SECONDS);
        loopCnt += 1;
        if (loopCnt >= 10) {
          LOG.warn("close() failed to terminate pool after 10 minutes. Abandoning pool.");
          break;
        }
      } while (!terminated);

    } catch (InterruptedException e) {
      LOG.warn("waitForTermination interrupted");

    } finally {
      this.closed = true;
    }
  }

  @Override
  public synchronized void flush() throws InterruptedIOException, RetriesExhaustedWithDetailsException {
    //主動調用flush提交數據到hbase服務端
    backgroundFlushCommits(true);
  }

  private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException {  
    LinkedList<Mutation> buffer = new LinkedList<>();  
    // Keep track of the size so that this thread doesn't spin forever  
    long dequeuedSize = 0;  
  
    try {  
      //分析所有提交的List<Put>,Put是Mutation的實現  
      Mutation m;  
      //如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous)&& writeAsyncBuffer裏仍然有Mutation對象  
      //那麼就不斷計算所佔空間大小dequeuedSize  
      //currentWriteBufferSize的大小則遞減  
      while ((writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) {  
        buffer.add(m);  
        long size = m.heapSize();  
        dequeuedSize += size;  
        currentWriteBufferSize.addAndGet(-size);  
      }  
  
      //backgroundFlushCommits(false)時候,當List<Put>,這裏不會進入  
      if (!synchronous && dequeuedSize == 0) {  
        return;  
      }  
  
      //backgroundFlushCommits(false)時候,這裏會進入,並且不會等待結果返回  
      if (!synchronous) {  
        //不會等待結果返回  
        ap.submit(tableName, buffer, true, null, false);  
        if (ap.hasError()) {  
          LOG.debug(tableName + ": One or more of the operations have failed -"  
              + " waiting for all operation in progress to finish (successfully or not)");  
        }  
      }  
      //backgroundFlushCommits(true)時候,這裏會進入,並且會等待結果返回  
      if (synchronous || ap.hasError()) {  
        while (!buffer.isEmpty()) {  
          ap.submit(tableName, buffer, true, null, false);  
        }  
        //會等待結果返回  
        RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null);  
        if (error != null) {  
          if (listener == null) {  
            throw error;  
          } else {  
            this.listener.onException(error, this);  
          }  
        }  
      }  
    } finally {  
      //如果還有數據,那麼給到外面最後提交  
      for (Mutation mut : buffer) {  
        long size = mut.heapSize();  
        currentWriteBufferSize.addAndGet(size);  
        dequeuedSize -= size;  
        writeAsyncBuffer.add(mut);  
      }  
    }  
  } 

  /**
   * 設置hbase客戶端緩衝區所佔空間大小
   */
  @Deprecated
  public void setWriteBufferSize(long writeBufferSize) throws RetriesExhaustedWithDetailsException,
      InterruptedIOException {
    this.writeBufferSize = writeBufferSize;
    if (currentWriteBufferSize.get() > writeBufferSize) {
      flush();
    }
  }

  /**
   * 獲取寫緩衝區大小
   */
  @Override
  public long getWriteBufferSize() {
    return this.writeBufferSize;
  }


  @Deprecated
  public List<Row> getWriteBuffer() {
    return Arrays.asList(writeAsyncBuffer.toArray(new Row[0]));
  }
}

  

5.BufferedMutatorExample

    在hbase的源代碼模塊hbase-examples裏提供了使用hbase客戶端的例子,這個java類是BufferedMutatorExample,從這個類裏面告訴了我們另外一種操作hbase客戶端的實現,其代碼如下:

 

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.BufferedMutatorParams;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

/**
 * An example of using the {@link BufferedMutator} interface.
 */
public class BufferedMutatorExample extends Configured implements Tool {

	private static final Log LOG = LogFactory.getLog(BufferedMutatorExample.class);

	private static final int POOL_SIZE = 10;// 線程池大小
	private static final int TASK_COUNT = 100;// 任務數
	private static final TableName TABLE = TableName.valueOf("foo");// hbase數據表foo
	private static final byte[] FAMILY = Bytes.toBytes("f");// hbase數據表foo的列簇f

	/**
	 * 重寫Tool.run(String [] args)方法,傳入的是main函數的參數String[] args
	 */
	@Override
    public int run(String[] args) throws InterruptedException, ExecutionException, TimeoutException {

        /** 一個異步回調監聽器,在hbase write失敗的時候觸發. */
        final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {
            @Override
            public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) {
                for (int i = 0; i < e.getNumExceptions(); i++) {
                    LOG.info("Failed to sent put " + e.getRow(i) + ".");
                }
            }
        };
        /** 
         * BufferedMutator的構造參數對象BufferedMutatorParams. 
         * BufferedMutatorParams參數如下:
         *   			TableName tableName
         *   			long writeBufferSize
         *   			int maxKeyValueSize
         *  			 ExecutorService pool
         *  			 BufferedMutator.ExceptionListener listener
         *  這裏只設置了屬性tableName和listener
         * */
        BufferedMutatorParams params = new BufferedMutatorParams(TABLE).listener(listener);
        
        /**
         * step 1: 創建一個連接Connection和BufferedMutator對象,供線程池中的所有線程共享使用
         *              這裏利用了jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {},
         *              在調用完畢後會主動調用(必須實現java.io.Closeable的對象)的close()方法,
         *              這裏也即默認實現了finally的功能,相當於執行了
         *              finally{
         *              	conn.close();
         *              	mutator.close();
         *              }
         */
        try (
                final Connection conn = ConnectionFactory.createConnection(getConf());
                final BufferedMutator mutator = conn.getBufferedMutator(params)
        ) {
            /** 操作BufferedTable對象的工作線程池,大小爲10 */
            final ExecutorService workerPool = Executors.newFixedThreadPool(POOL_SIZE);
            List<Future<Void>> futures = new ArrayList<>(TASK_COUNT);

            /** 不斷創建任務,放入線程池執行,任務數爲100個 */
            for (int i = 0; i < TASK_COUNT; i++) {
                futures.add(workerPool.submit(new Callable<Void>() {
                    @Override
                    public Void call() throws Exception {
                        /** 
                         * step 2: 所有任務都共同向BufferedMutator的緩衝區發送數據,
                         *              所有任務共享BufferedMutator的緩衝區(hbase.client.write.buffer),
                         *              所有任務共享回調監聽器listener和線程池
                         *  */

                        /** 
                         * 這裏構造Put對象
                         *  */
                        Put p = new Put(Bytes.toBytes("someRow"));
                        p.addColumn(FAMILY, Bytes.toBytes("someQualifier"), Bytes.toBytes("some value"));
                        /** 
                         * 添加數據到BufferedMutator的緩衝區(hbase.client.write.buffer),
                         * 這裏不會立即提交數據到hbase服務端,只會在緩衝區大小大於hbase.client.write.buffer時候纔會主動提交數據到服務端
                         *  */
                        mutator.mutate(p);
                        
                        /** 
                         * TODO
                         * 這裏你可以在退出本任務前自己主動調用mutator.flush()提交數據到hbase服務端
                         * mutator.flush();
                         *  */
                        return null;
                    }
                }));
            }

            /**
             * step 3: 遍歷每個回調任務的Future,如果未執行完,每個Future等待5分鐘
             */
            for (Future<Void> f : futures) {
                f.get(5, TimeUnit.MINUTES);
            }
            /**
             * 最後關閉線程池
             */
            workerPool.shutdown();
        } catch (IOException e) {
            // exception while creating/destroying Connection or BufferedMutator
            LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
        }
        /**
         * 這裏沒有finally代碼,原因是前面用了jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {},
         * 在調用完畢後會主動調用(必須實現java.io.Closeable的對象)的close()方法,也即會調用conn.close(),mutator.close()
         */
        return 0;
    }

	public static void main(String[] args) throws Exception {
		//調用工具類ToolRunner執行實現了接口Tool的對象BufferedMutatorExample的run方法,同時會把String[] args傳入BufferedMutatorExample的run方法
		ToolRunner.run(new BufferedMutatorExample(), args);
	}
}

 

6.源碼收穫

  •     BufferedMutator完全可以用於操作hbase客戶端;
  •     BufferedMutator可以供多線程共享使用;
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章