源碼解讀--(1)hbase客戶端源代碼 | http://aperise.iteye.com/blog/2372350 |
源碼解讀--(2)hbase-examples BufferedMutator Example | http://aperise.iteye.com/blog/2372505 |
源碼解讀--(3)hbase-examples MultiThreadedClientExample | http://aperise.iteye.com/blog/2372534 |
1.摒棄HTable,直接創建HTable裏的BufferedMutator對象操作hbase客戶端完全可行
在前面的hbase客戶端源代碼分析中,我們客戶端的創建方式如下:
//默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation
Connection connection = ConnectionFactory.createConnection(configuration);
//默認table實現是org.apache.hadoop.hbase.client.HTable
Table table = connection.getTable(TableName.valueOf("tableName"));
- 默認我們拿到了connection的實現org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation,裏面我們需要注意的是通過setupRegistry()類設置了與zookeeper交互的重要類org.apache.hadoop.hbase.client.ZookeeperRegistry類,後續與zookeeper交互都由此類完成
- 然後通過connection拿到了table的實現org.apache.hadoop.hbase.client.HTable
- 最後發現org.apache.hadoop.hbase.client.HTable歸根結底持有的就是BufferedMutatorImpl類型的屬性mutator,所有後續的操作都是基於mutator操作
那麼其實我們操作hbase客戶端,完全可以摒棄HTable對象,直接構建BufferedMutator,然後操作hbase,正如所料,在hbase的源碼模塊hbase-examples裏也正好提到了這種使用方法,使用的關鍵代碼如下:
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("hbase.client.write.buffer", "2097152");
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");
BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf("tableName"));
//3177不是我杜撰的,是2*hbase.client.write.buffer/put.heapSize()計算出來的
int bestBathPutSize = 3177;
//這裏利用jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {}
//相當於調用了finally功能,調用(必須實現java.io.Closeable的對象)的close()方法,也即會調用conn.close(),mutator.close()
try(
//默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation
Connection conn = ConnectionFactory.createConnection(configuration);
//默認mutator實現是org.apache.hadoop.hbase.client.BufferedMutatorImpl
BufferedMutator mutator = conn.getBufferedMutator(params);
){
List<Put> putLists = new ArrayList<Put>();
for(int count=0;count<100000;count++){
Put put = new Put(rowkey.getBytes());
put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());
put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());
put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());
put.setDurability(Durability.SKIP_WAL);
putLists.add(put);
if(putLists.size()==bestBathPutSize){
//達到最佳大小值了,馬上提交一把
mutator.mutate(putLists);
mutator.flush();
putLists.clear();
}
}
//剩下的未提交數據,最後做一次提交
mutator.mutate(putLists);
mutator.flush();
}catch(IOException e) {
LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
}
2.BufferedMutatorParams
BufferedMutatorParams主要是收集構造BufferedMutator對象的參數信息,這些參數包括hbase數據表名、hbase客戶端緩衝區、hbase rowkey最大所佔空間、線程池和監聽hbase操作的回調監聽器(比如監聽hbase寫入失敗)
package org.apache.hadoop.hbase.client;
import java.util.concurrent.ExecutorService;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
/**
* 構造BufferedMutator對象的類BufferedMutatorParams
*/
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class BufferedMutatorParams {
static final int UNSET = -1;
private final TableName tableName;//hbase數據表
private long writeBufferSize = UNSET;//hbase客戶端緩衝區
private int maxKeyValueSize = UNSET;//hbase rowkey最大所佔空間
private ExecutorService pool = null;//線程池
private BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {//監聽hbase操作的回調監聽器,比如監聽hbase寫入失敗
@Override
public void onException(RetriesExhaustedWithDetailsException exception,
BufferedMutator bufferedMutator)
throws RetriesExhaustedWithDetailsException {
throw exception;
}
};
public BufferedMutatorParams(TableName tableName) {//構造方法
this.tableName = tableName;
}
public TableName getTableName() {//獲取表名
return tableName;
}
public long getWriteBufferSize() {//獲取寫緩衝區大小
return writeBufferSize;
}
/**
* 重寫緩衝區設置函數
*/
public BufferedMutatorParams writeBufferSize(long writeBufferSize) {
this.writeBufferSize = writeBufferSize;
return this;
}
public int getMaxKeyValueSize() {//獲取rowkey所佔空間
return maxKeyValueSize;
}
/**
* 重寫設置rowkey所佔空間的函數
*/
public BufferedMutatorParams maxKeyValueSize(int maxKeyValueSize) {
this.maxKeyValueSize = maxKeyValueSize;
return this;
}
public ExecutorService getPool() {//獲取線程池
return pool;
}
public BufferedMutatorParams pool(ExecutorService pool) {//構造函數
this.pool = pool;
return this;
}
public BufferedMutator.ExceptionListener getListener() {//獲取監聽器
return listener;
}
public BufferedMutatorParams listener(BufferedMutator.ExceptionListener listener) {//構造函數
this.listener = listener;
return this;
}
}
3.BufferedMutator
BufferedMutator是一個接口,主要定義了一些抽象方法:
public interface BufferedMutator extends Closeable {
TableName getName();//獲取表名
Configuration getConfiguration();//獲取hadoop配置對象Configuration
void mutate(Mutation mutation) throws IOException;//操作緩衝區
void mutate(List<? extends Mutation> mutations) throws IOException;//批量操作緩衝區
@Override
void close() throws IOException;//實現Closeable接口,這樣可以利用JDK1.7新特性不寫finally就可以關閉對象
void flush() throws IOException;//想hbase服務端提交數據請求
long getWriteBufferSize();//獲取寫緩衝區大小
@InterfaceAudience.Public
@InterfaceStability.Evolving
interface ExceptionListener {//監聽器
public void onException(RetriesExhaustedWithDetailsException exception,
BufferedMutator mutator) throws RetriesExhaustedWithDetailsException;
}
}
4.BufferedMutatorImpl
package org.apache.hadoop.hbase.client;
import com.google.common.annotations.VisibleForTesting;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
import org.apache.hadoop.hbase.ipc.RpcControllerFactory;
import java.io.IOException;
import java.io.InterruptedIOException;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
/**
* hbase1.0.0纔開始使用BufferedMutatorImpl
* 主要用於在多線程中操作同一個數據表
* 需要注意的是多線程中共享一個BufferedMutator對象,如果某個線程中出錯,其他線程也會出錯
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class BufferedMutatorImpl implements BufferedMutator {
private static final Log LOG = LogFactory.getLog(BufferedMutatorImpl.class);
private final ExceptionListener listener;//hbase客戶端每次操作的監聽回調對象
protected ClusterConnection connection; //持有的鏈接
private final TableName tableName;//hbase數據表
private volatile Configuration conf;//hadoop配置類Configuration
@VisibleForTesting
final ConcurrentLinkedQueue<Mutation> writeAsyncBuffer = new ConcurrentLinkedQueue<Mutation>();//hbase緩衝區隊列
@VisibleForTesting
AtomicLong currentWriteBufferSize = new AtomicLong(0);//線程安全的長整型值,主要累計當前在緩衝區中數據所佔空間大小
private long writeBufferSize;//hbase客戶端緩衝區大小
private final int maxKeyValueSize;//hbase客戶端rowkey所佔最大空間
private boolean closed = false;//hbase客戶端是否已經被關閉
private final ExecutorService pool;//hbase客戶端使用的線程池
@VisibleForTesting
protected AsyncProcess ap; //hbase客戶端異步操作對象
BufferedMutatorImpl(ClusterConnection conn, RpcRetryingCallerFactory rpcCallerFactory,
RpcControllerFactory rpcFactory, BufferedMutatorParams params) {
if (conn == null || conn.isClosed()) {
throw new IllegalArgumentException("Connection is null or closed.");
}
this.tableName = params.getTableName();
this.connection = conn;
this.conf = connection.getConfiguration();
this.pool = params.getPool();
this.listener = params.getListener();
//基於傳入的conf構建自己的屬性ConnectionConfiguration,客戶端沒有設置的配置會自動使用默認值
ConnectionConfiguration tableConf = new ConnectionConfiguration(conf);
//設置緩衝區大小
this.writeBufferSize = params.getWriteBufferSize() != BufferedMutatorParams.UNSET ? params.getWriteBufferSize() : tableConf.getWriteBufferSize();
//設置rowkey最大所佔空間
this.maxKeyValueSize = params.getMaxKeyValueSize() != BufferedMutatorParams.UNSET ? params.getMaxKeyValueSize() : tableConf.getMaxKeyValueSize();
//hbase客戶端異步操作對象
ap = new AsyncProcess(connection, conf, pool, rpcCallerFactory, true, rpcFactory);
}
@Override
public TableName getName() {//獲取表名
return tableName;
}
@Override
public Configuration getConfiguration() {//獲取hadoop配置對象Configuration,這裏是客戶端傳入的conf
return conf;
}
@Override
public void mutate(Mutation m) throws InterruptedIOException,
RetriesExhaustedWithDetailsException {//操作緩衝區
mutate(Arrays.asList(m));
}
@Override
public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
//如果BufferedMutatorImpl已經關閉,直接退出返回
if (closed) {
throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");
}
//這裏先不斷循環累計提交的List<Put>記錄所佔的空間,放置到toAddSize
long toAddSize = 0;
for (Mutation m : ms) {
if (m instanceof Put) {
validatePut((Put) m);
}
toAddSize += m.heapSize();
}
// This behavior is highly non-intuitive... it does not protect us against
// 94-incompatible behavior, which is a timing issue because hasError, the below code
// and setter of hasError are not synchronized. Perhaps it should be removed.
if (ap.hasError()) {
//設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize
currentWriteBufferSize.addAndGet(toAddSize);
//把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer,在爲提交完成前先不進行清理
writeAsyncBuffer.addAll(ms);
//這裏當捕獲到異常時候,再進行異常前的一次數據提交
backgroundFlushCommits(true);
} else {
//設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize
currentWriteBufferSize.addAndGet(toAddSize);
//把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer,在爲提交完成前先不進行清理
writeAsyncBuffer.addAll(ms);
}
// Now try and queue what needs to be queued.
// 如果當前提交的List<Put>記錄所佔空間大於hbase.client.write.buffer設置的值,默認2MB,那麼就馬上調用backgroundFlushCommits方法
// 如果小於hbase.client.write.buffer設置的值,那麼就直接退出,啥也不做
while (currentWriteBufferSize.get() > writeBufferSize) {
backgroundFlushCommits(false);
}
}
// 校驗Put
public void validatePut(final Put put) throws IllegalArgumentException {
HTable.validatePut(put, maxKeyValueSize);
}
@Override
public synchronized void close() throws IOException {
try {
if (this.closed) {//如果已經關閉了,直接返回
return;
}
//關閉前做最後一次提交
backgroundFlushCommits(true);
this.pool.shutdown();//關閉線程池
boolean terminated;
int loopCnt = 0;
do {
// wait until the pool has terminated
terminated = this.pool.awaitTermination(60, TimeUnit.SECONDS);
loopCnt += 1;
if (loopCnt >= 10) {
LOG.warn("close() failed to terminate pool after 10 minutes. Abandoning pool.");
break;
}
} while (!terminated);
} catch (InterruptedException e) {
LOG.warn("waitForTermination interrupted");
} finally {
this.closed = true;
}
}
@Override
public synchronized void flush() throws InterruptedIOException, RetriesExhaustedWithDetailsException {
//主動調用flush提交數據到hbase服務端
backgroundFlushCommits(true);
}
private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
LinkedList<Mutation> buffer = new LinkedList<>();
// Keep track of the size so that this thread doesn't spin forever
long dequeuedSize = 0;
try {
//分析所有提交的List<Put>,Put是Mutation的實現
Mutation m;
//如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous)&& writeAsyncBuffer裏仍然有Mutation對象
//那麼就不斷計算所佔空間大小dequeuedSize
//currentWriteBufferSize的大小則遞減
while ((writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) {
buffer.add(m);
long size = m.heapSize();
dequeuedSize += size;
currentWriteBufferSize.addAndGet(-size);
}
//backgroundFlushCommits(false)時候,當List<Put>,這裏不會進入
if (!synchronous && dequeuedSize == 0) {
return;
}
//backgroundFlushCommits(false)時候,這裏會進入,並且不會等待結果返回
if (!synchronous) {
//不會等待結果返回
ap.submit(tableName, buffer, true, null, false);
if (ap.hasError()) {
LOG.debug(tableName + ": One or more of the operations have failed -"
+ " waiting for all operation in progress to finish (successfully or not)");
}
}
//backgroundFlushCommits(true)時候,這裏會進入,並且會等待結果返回
if (synchronous || ap.hasError()) {
while (!buffer.isEmpty()) {
ap.submit(tableName, buffer, true, null, false);
}
//會等待結果返回
RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null);
if (error != null) {
if (listener == null) {
throw error;
} else {
this.listener.onException(error, this);
}
}
}
} finally {
//如果還有數據,那麼給到外面最後提交
for (Mutation mut : buffer) {
long size = mut.heapSize();
currentWriteBufferSize.addAndGet(size);
dequeuedSize -= size;
writeAsyncBuffer.add(mut);
}
}
}
/**
* 設置hbase客戶端緩衝區所佔空間大小
*/
@Deprecated
public void setWriteBufferSize(long writeBufferSize) throws RetriesExhaustedWithDetailsException,
InterruptedIOException {
this.writeBufferSize = writeBufferSize;
if (currentWriteBufferSize.get() > writeBufferSize) {
flush();
}
}
/**
* 獲取寫緩衝區大小
*/
@Override
public long getWriteBufferSize() {
return this.writeBufferSize;
}
@Deprecated
public List<Row> getWriteBuffer() {
return Arrays.asList(writeAsyncBuffer.toArray(new Row[0]));
}
}
5.BufferedMutatorExample
在hbase的源代碼模塊hbase-examples裏提供了使用hbase客戶端的例子,這個java類是BufferedMutatorExample,從這個類裏面告訴了我們另外一種操作hbase客戶端的實現,其代碼如下:
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.BufferedMutatorParams;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
/**
* An example of using the {@link BufferedMutator} interface.
*/
public class BufferedMutatorExample extends Configured implements Tool {
private static final Log LOG = LogFactory.getLog(BufferedMutatorExample.class);
private static final int POOL_SIZE = 10;// 線程池大小
private static final int TASK_COUNT = 100;// 任務數
private static final TableName TABLE = TableName.valueOf("foo");// hbase數據表foo
private static final byte[] FAMILY = Bytes.toBytes("f");// hbase數據表foo的列簇f
/**
* 重寫Tool.run(String [] args)方法,傳入的是main函數的參數String[] args
*/
@Override
public int run(String[] args) throws InterruptedException, ExecutionException, TimeoutException {
/** 一個異步回調監聽器,在hbase write失敗的時候觸發. */
final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {
@Override
public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) {
for (int i = 0; i < e.getNumExceptions(); i++) {
LOG.info("Failed to sent put " + e.getRow(i) + ".");
}
}
};
/**
* BufferedMutator的構造參數對象BufferedMutatorParams.
* BufferedMutatorParams參數如下:
* TableName tableName
* long writeBufferSize
* int maxKeyValueSize
* ExecutorService pool
* BufferedMutator.ExceptionListener listener
* 這裏只設置了屬性tableName和listener
* */
BufferedMutatorParams params = new BufferedMutatorParams(TABLE).listener(listener);
/**
* step 1: 創建一個連接Connection和BufferedMutator對象,供線程池中的所有線程共享使用
* 這裏利用了jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {},
* 在調用完畢後會主動調用(必須實現java.io.Closeable的對象)的close()方法,
* 這裏也即默認實現了finally的功能,相當於執行了
* finally{
* conn.close();
* mutator.close();
* }
*/
try (
final Connection conn = ConnectionFactory.createConnection(getConf());
final BufferedMutator mutator = conn.getBufferedMutator(params)
) {
/** 操作BufferedTable對象的工作線程池,大小爲10 */
final ExecutorService workerPool = Executors.newFixedThreadPool(POOL_SIZE);
List<Future<Void>> futures = new ArrayList<>(TASK_COUNT);
/** 不斷創建任務,放入線程池執行,任務數爲100個 */
for (int i = 0; i < TASK_COUNT; i++) {
futures.add(workerPool.submit(new Callable<Void>() {
@Override
public Void call() throws Exception {
/**
* step 2: 所有任務都共同向BufferedMutator的緩衝區發送數據,
* 所有任務共享BufferedMutator的緩衝區(hbase.client.write.buffer),
* 所有任務共享回調監聽器listener和線程池
* */
/**
* 這裏構造Put對象
* */
Put p = new Put(Bytes.toBytes("someRow"));
p.addColumn(FAMILY, Bytes.toBytes("someQualifier"), Bytes.toBytes("some value"));
/**
* 添加數據到BufferedMutator的緩衝區(hbase.client.write.buffer),
* 這裏不會立即提交數據到hbase服務端,只會在緩衝區大小大於hbase.client.write.buffer時候纔會主動提交數據到服務端
* */
mutator.mutate(p);
/**
* TODO
* 這裏你可以在退出本任務前自己主動調用mutator.flush()提交數據到hbase服務端
* mutator.flush();
* */
return null;
}
}));
}
/**
* step 3: 遍歷每個回調任務的Future,如果未執行完,每個Future等待5分鐘
*/
for (Future<Void> f : futures) {
f.get(5, TimeUnit.MINUTES);
}
/**
* 最後關閉線程池
*/
workerPool.shutdown();
} catch (IOException e) {
// exception while creating/destroying Connection or BufferedMutator
LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
}
/**
* 這裏沒有finally代碼,原因是前面用了jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {},
* 在調用完畢後會主動調用(必須實現java.io.Closeable的對象)的close()方法,也即會調用conn.close(),mutator.close()
*/
return 0;
}
public static void main(String[] args) throws Exception {
//調用工具類ToolRunner執行實現了接口Tool的對象BufferedMutatorExample的run方法,同時會把String[] args傳入BufferedMutatorExample的run方法
ToolRunner.run(new BufferedMutatorExample(), args);
}
}
6.源碼收穫
- BufferedMutator完全可以用於操作hbase客戶端;
- BufferedMutator可以供多線程共享使用;