Hbase 多版本

hbase支持多版本存儲,其一條數據的多版本是以timestamp來標識的。

設置多版本

  1. .新建測試表
hbase(main):032:0* create 'tmp_mutilversion', {NAME => 'f', VERSIONS => 5}
0 row(s) in 2.2860 seconds

hbase(main):006:0> desc 'tmp_mutilversion'
Table tmp_mutilversion is ENABLED                                                                                                                                                                                                                                             
tmp_mutilversion                                                                                                                                                                                                                                                              
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                                   
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '5', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}  
1 row(s) in 0.0290 seconds

  1. put 測試數據
put 'tmp_mutilversion','rk1111','f:q1','v10'
put 'tmp_mutilversion','rk1112','f:q1','v20'
put 'tmp_mutilversion','rk1113','f:q1','v30'
put 'tmp_mutilversion','rk1114','f:q1','v40'
put 'tmp_mutilversion','rk1115','f:q1','v50'
put 'tmp_mutilversion','rk1116','f:q1','v60'
put 'tmp_mutilversion','rk1112','f:q1','v21'
put 'tmp_mutilversion','rk1113','f:q1','v31'
put 'tmp_mutilversion','rk1114','f:q1','v41'
put 'tmp_mutilversion','rk1117','f:q1','v70'
put 'tmp_mutilversion','rk1118','f:q1','v80'
put 'tmp_mutilversion','rk1119','f:q1','v90'
put 'tmp_mutilversion','rk1113','f:q1','v32'
put 'tmp_mutilversion','rk1114','f:q1','v42'
  1. scan表看結果,直接看是看不到多個版本的,直接查看到的是最新的那條數據
hbase(main):047:0> scan 'tmp_mutilversion'
ROW                                                                  COLUMN+CELL                                                                                                                                                                                              
 rk1111                                                              column=f:q1, timestamp=1586916990543, value=v10                                                                                                                                                          
 rk1112                                                              column=f:q1, timestamp=1586916990644, value=v21                                                                                                                                                          
 rk1113                                                              column=f:q1, timestamp=1586916990718, value=v32                                                                                                                                                          
 rk1114                                                              column=f:q1, timestamp=1586916991906, value=v42                                                                                                                                                          
 rk1115                                                              column=f:q1, timestamp=1586916990615, value=v50                                                                                                                                                          
 rk1116                                                              column=f:q1, timestamp=1586916990630, value=v60                                                                                                                                                          
 rk1117                                                              column=f:q1, timestamp=1586916990683, value=v70                                                                                                                                                          
 rk1118                                                              column=f:q1, timestamp=1586916990694, value=v80                                                                                                                                                          
 rk1119                                                              column=f:q1, timestamp=1586916990707, value=v90                                                                                                                                                          
9 row(s) in 0.0350 seconds

  1. 查看多版本

hbase(main):001:0> get 'tmp_mutilversion','rk1114',{COLUMN => 'f:q1', VERSIONS => 2}
COLUMN                                                               CELL                                                                                                                                                                                                     
 f:q1                                                                timestamp=1586916991906, value=v42                                                                                                                                                                       
 f:q1                                                                timestamp=1586916990671, value=v41                                                                                                                                                                       
1 row(s) in 0.2300 seconds

hbase(main):002:0> get 'tmp_mutilversion','rk1114',{COLUMN => 'f:q1', VERSIONS => 5}
COLUMN                                                               CELL                                                                                                                                                                                                     
 f:q1                                                                timestamp=1586916991906, value=v42                                                                                                                                                                       
 f:q1                                                                timestamp=1586916990671, value=v41                                                                                                                                                                       
 f:q1                                                                timestamp=1586916990598, value=v40                                                                                                                                                                       
1 row(s) in 0.0090 seconds

  1. 查看wal日誌
Sequence=4 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1111, column=f:q1
Sequence=5 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1112, column=f:q1
Sequence=6 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1113, column=f:q1
Sequence=7 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1114, column=f:q1
Sequence=8 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1115, column=f:q1
Sequence=9 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1116, column=f:q1
Sequence=10 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1112, column=f:q1
Sequence=11 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1113, column=f:q1
Sequence=12 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1114, column=f:q1
Sequence=13 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1117, column=f:q1
Sequence=14 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1118, column=f:q1
Sequence=15 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1119, column=f:q1
Sequence=16 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:30 CST 2020
row=rk1113, column=f:q1
Sequence=17 , region=0b085a4a81c0fa739c4117970f75ac7c at write timestamp=Wed Apr 15 10:16:31 CST 2020
row=rk1114, column=f:q1

  1. 查看hfile文件,可能需要手動flush一下這個表
K: rk1111/f:q1/1586916990543/Put/vlen=3/seqid=4 V: v10
K: rk1112/f:q1/1586916990644/Put/vlen=3/seqid=10 V: v21
K: rk1112/f:q1/1586916990566/Put/vlen=3/seqid=5 V: v20
K: rk1113/f:q1/1586916990718/Put/vlen=3/seqid=16 V: v32
K: rk1113/f:q1/1586916990659/Put/vlen=3/seqid=11 V: v31
K: rk1113/f:q1/1586916990583/Put/vlen=3/seqid=6 V: v30
K: rk1114/f:q1/1586916991906/Put/vlen=3/seqid=17 V: v42
K: rk1114/f:q1/1586916990671/Put/vlen=3/seqid=12 V: v41
K: rk1114/f:q1/1586916990598/Put/vlen=3/seqid=7 V: v40
K: rk1115/f:q1/1586916990615/Put/vlen=3/seqid=8 V: v50
K: rk1116/f:q1/1586916990630/Put/vlen=3/seqid=9 V: v60
K: rk1117/f:q1/1586916990683/Put/vlen=3/seqid=13 V: v70
K: rk1118/f:q1/1586916990694/Put/vlen=3/seqid=14 V: v80
K: rk1119/f:q1/1586916990707/Put/vlen=3/seqid=15 V: v90
Scanned kv count -> 14

從上述測試來看,設置了多版本{5}以後,不同對於同一條數據不同timestamp的數據,是存放在同一個hfile文件中。

那麼,假設建表時默認只有一個version,那麼同一個rk還能寫入多條數據嗎?

  1. 建測試表
hbase(main):009:0* create 'tmp_singleversion', {NAME => 'f', VERSIONS => 1}
0 row(s) in 2.3990 seconds

=> Hbase::Table - tmp_singleversion
hbase(main):010:0> desc 'tmp_singleversion'
Table tmp_singleversion is ENABLED                                                                                                                                                                                                                                            
tmp_singleversion                                                                                                                                                                                                                                                             
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                                   
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}  
1 row(s) in 0.0210 seconds
  1. 輸入同樣的數據
put 'tmp_singleversion','rk1111','f:q1','v10'
put 'tmp_singleversion','rk1112','f:q1','v20'
put 'tmp_singleversion','rk1113','f:q1','v30'
put 'tmp_singleversion','rk1114','f:q1','v40'
put 'tmp_singleversion','rk1115','f:q1','v50'
put 'tmp_singleversion','rk1116','f:q1','v60'
put 'tmp_singleversion','rk1112','f:q1','v21'
put 'tmp_singleversion','rk1113','f:q1','v31'
put 'tmp_singleversion','rk1114','f:q1','v41'
put 'tmp_singleversion','rk1117','f:q1','v70'
put 'tmp_singleversion','rk1118','f:q1','v80'
put 'tmp_singleversion','rk1119','f:q1','v90'
put 'tmp_singleversion','rk1113','f:q1','v32'
put 'tmp_singleversion','rk1114','f:q1','v42'
  1. 同樣scan
hbase(main):025:0> scan 'tmp_singleversion'
ROW                                                                  COLUMN+CELL                                                                                                                                                                                              
 rk1111                                                              column=f:q1, timestamp=1586918408170, value=v10                                                                                                                                                          
 rk1112                                                              column=f:q1, timestamp=1586918408288, value=v21                                                                                                                                                          
 rk1113                                                              column=f:q1, timestamp=1586918408533, value=v32                                                                                                                                                          
 rk1114                                                              column=f:q1, timestamp=1586918409641, value=v42                                                                                                                                                          
 rk1115                                                              column=f:q1, timestamp=1586918408257, value=v50                                                                                                                                                          
 rk1116                                                              column=f:q1, timestamp=1586918408271, value=v60                                                                                                                                                          
 rk1117                                                              column=f:q1, timestamp=1586918408350, value=v70                                                                                                                                                          
 rk1118                                                              column=f:q1, timestamp=1586918408367, value=v80                                                                                                                                                          
 rk1119                                                              column=f:q1, timestamp=1586918408505, value=v90                                                                                                                                                          
9 row(s) in 0.0150 seconds

  1. 查看多版本看看
hbase(main):028:0> get 'tmp_singleversion','rk1114',{COLUMN => 'f:q1', VERSIONS => 5}
COLUMN                                                               CELL                                                                                                                                                                                                     
 f:q1                                                                timestamp=1586918409641, value=v42                                                                                                                                                                       
1 row(s) in 0.0040 seconds

hbase(main):029:0> get 'tmp_singleversion','rk1114',{COLUMN => 'f:q1', VERSIONS => 2}
COLUMN                                                               CELL                                                                                                                                                                                                     
 f:q1                                                                timestamp=1586918409641, value=v42                                                                                                                                                                       
1 row(s) in 0.0040 seconds

  1. wal
Sequence=4 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1111, column=f:q1
Sequence=5 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1112, column=f:q1
Sequence=6 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1113, column=f:q1
Sequence=7 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1114, column=f:q1
Sequence=8 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1115, column=f:q1
Sequence=9 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1116, column=f:q1
Sequence=10 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1112, column=f:q1
Sequence=11 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1113, column=f:q1
Sequence=12 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1114, column=f:q1
Sequence=13 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1117, column=f:q1
Sequence=14 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1118, column=f:q1
Sequence=15 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1119, column=f:q1
Sequence=16 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:08 CST 2020
row=rk1113, column=f:q1
Sequence=17 , region=76c57383a915f168383bff0c26d59251 at write timestamp=Wed Apr 15 10:40:09 CST 2020
row=rk1114, column=f:q1

6.hfile

K: rk1111/f:q1/1586918408170/Put/vlen=3/seqid=4 V: v10
K: rk1112/f:q1/1586918408288/Put/vlen=3/seqid=10 V: v21
K: rk1113/f:q1/1586918408533/Put/vlen=3/seqid=16 V: v32
K: rk1114/f:q1/1586918409641/Put/vlen=3/seqid=17 V: v42
K: rk1115/f:q1/1586918408257/Put/vlen=3/seqid=8 V: v50
K: rk1116/f:q1/1586918408271/Put/vlen=3/seqid=9 V: v60
K: rk1117/f:q1/1586918408350/Put/vlen=3/seqid=13 V: v70
K: rk1118/f:q1/1586918408367/Put/vlen=3/seqid=14 V: v80
K: rk1119/f:q1/1586918408505/Put/vlen=3/seqid=15 V: v90
Scanned kv count -> 9

所以多版本這部分數據發生了什麼?

查看日誌,可以看出最後正式提交後就是9條數據了。所以wal中寫入後,在memstore到在flush的過程中,就對多餘的版本做出處理了,生成的hfile就是這9條數據。具體還有待研究源碼拉。

2020-04-15 10:50:05,940 DEBUG org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store file hdfs://hbase/data/default/tmp_singleversion/76c57383a915f168383bff0c26d59251/.tmp/360a0fbdaa5a46c4bf94da028c6a0679 as hdfs://hbase/data/default/tmp_singleversion/76c57383a915f168383bff0c26d59251/f/360a0fbdaa5a46c4bf94da028c6a0679
2020-04-15 10:50:05,962 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://hbase/data/default/tmp_singleversion/76c57383a915f168383bff0c26d59251/f/360a0fbdaa5a46c4bf94da028c6a0679, entries=9, sequenceid=19, filesize=5.1 K

另一種情況,也是實際使用的情況

  1. 建測試表
hbase(main):037:0* create 'tmp_single', {NAME => 'f', VERSIONS => 1}
0 row(s) in 2.3840 seconds

=> Hbase::Table - tmp_single
hbase(main):038:0> desc 'tmp_single'
Table tmp_single is ENABLED                                                                                           
tmp_single                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                           
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_E
NCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6
5536', REPLICATION_SCOPE => '0'}                                                                                      
1 row(s) in 0.0370 seconds

  1. 輸入第一部分數據
put 'tmp_single','rk1111','f:q1','v10'
put 'tmp_single','rk1112','f:q1','v20'
put 'tmp_single','rk1113','f:q1','v30'
put 'tmp_single','rk1114','f:q1','v40'
put 'tmp_single','rk1115','f:q1','v50'
put 'tmp_single','rk1116','f:q1','v60'
put 'tmp_single','rk1117','f:q1','v70'
put 'tmp_single','rk1118','f:q1','v80'
put 'tmp_single','rk1119','f:q1','v90'
  1. flush以生成hfile
K: rk1111/f:q1/1586920978622/Put/vlen=3/seqid=4 V: v10
K: rk1112/f:q1/1586920978659/Put/vlen=3/seqid=5 V: v20
K: rk1113/f:q1/1586920978674/Put/vlen=3/seqid=6 V: v30
K: rk1114/f:q1/1586920978688/Put/vlen=3/seqid=7 V: v40
K: rk1115/f:q1/1586920978701/Put/vlen=3/seqid=8 V: v50
K: rk1116/f:q1/1586920978717/Put/vlen=3/seqid=9 V: v60
K: rk1117/f:q1/1586920978733/Put/vlen=3/seqid=10 V: v70
K: rk1118/f:q1/1586920978748/Put/vlen=3/seqid=11 V: v80
K: rk1119/f:q1/1586920980686/Put/vlen=3/seqid=12 V: v90
Scanned kv count -> 9

  1. 再輸入多版本部分的數據
put 'tmp_single','rk1112','f:q1','v21'
put 'tmp_single','rk1113','f:q1','v31'
put 'tmp_single','rk1114','f:q1','v41'
put 'tmp_single','rk1113','f:q1','v32'
put 'tmp_single','rk1114','f:q1','v42'

5.查看錶

hbase(main):065:0> scan 'tmp_single'
ROW                                                                  COLUMN+CELL                                                                                                                                                                                              
 rk1111                                                              column=f:q1, timestamp=1586920978622, value=v10                                                                                                                                                          
 rk1112                                                              column=f:q1, timestamp=1586921446825, value=v21                                                                                                                                                          
 rk1113                                                              column=f:q1, timestamp=1586921446867, value=v32                                                                                                                                                          
 rk1114                                                              column=f:q1, timestamp=1586921448047, value=v42                                                                                                                                                          
 rk1115                                                              column=f:q1, timestamp=1586920978701, value=v50                                                                                                                                                          
 rk1116                                                              column=f:q1, timestamp=1586920978717, value=v60                                                                                                                                                          
 rk1117                                                              column=f:q1, timestamp=1586920978733, value=v70                                                                                                                                                          
 rk1118                                                              column=f:q1, timestamp=1586920978748, value=v80                                                                                                                                                          
 rk1119                                                              column=f:q1, timestamp=1586920980686, value=v90                                                                                                                                                          
9 row(s) in 0.0420 seconds

6.flush 會生成一個新的hfile,查看

K: rk1112/f:q1/1586921446825/Put/vlen=3/seqid=17 V: v21
K: rk1113/f:q1/1586921446867/Put/vlen=3/seqid=20 V: v32
K: rk1114/f:q1/1586921448047/Put/vlen=3/seqid=21 V: v42
Scanned kv count -> 3

那麼應該是這樣,對於一個表的一個rk,同一次flush中,會對超出版本數的數據做出處理。不同次flush寫入同一rk數據的多版本,會在當時生成的hfile中保存;多個storfile中的多版本數據,就會在majorcompact後進行合併和丟棄,只留下最新的一條。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章