Scan,get用法
1. get help幫助信息
從下列get用法信息可以看出 get 後面可以跟table表名,rowkey,以及column,value.但是如果想通過get直接獲取一個表中的全部數據是做不到的,這種情況就要用到另外一個命令scan。
hbase(main):214:0> help 'get' Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples: hbase> get 'ns1:t1', 'r1' hbase> get 't1', 'r1' hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} hbase> get 't1', 'r1', {COLUMN => 'c1'} hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> get 't1', 'r1', 'c1' hbase> get 't1', 'r1', 'c1', 'c2' hbase> get 't1', 'r1', ['c1', 'c2'] hbsase> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}} hbsase> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
2. Scan help幫助信息
scan的用法很多,可以直接掃描全表信息也可以通過指定條件來顯示我們所需要獲取的數據。這裏涉及到Filter的用法接下來會逐一演示
hbase(main):221:0> help 'scan' Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. Some examples: hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} For setting the Operation Attributes hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}} hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a table, like such: hbase> t = get_table 't' hbase> t.scan Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
3. 通過get,Scan用法來獲取表中指定rowkey信息。
1. get 獲取table中rowkey語句 於 Scan獲取table中rowkey語句 ================================================================================================================= 【get】 hbase(main):011:0> get 'liupeng:employee','1001' COLUMN CELL contect:mail timestamp=1522202414649, [email protected] contect:phone timestamp=1522202430196, value=15962459503 group:number timestamp=1522202455929, value=1 info:age timestamp=1522202371257, value=34 info:name timestamp=1522202364156, value=liupeng 【Scan】 hbase(main):010:0> scan 'liupeng:employee',FILTER=>"PrefixFilter('1001')" ROW COLUMN+CELL 1001 column=contect:mail, timestamp=1522202414649, [email protected] 1001 column=contect:phone, timestamp=1522202430196, value=15962459503 1001 column=group:number, timestamp=1522202455929, value=1 1001 column=info:age, timestamp=1522202371257, value=34 1001 column=info:name, timestamp=1522202364156, value=liupeng 1 row(s) in 0.0590 seconds 總結:從上述兩種不同的方法可以看出Scan的結果包含了rowkey本身。而get獲取到的信息不包含rowkey的值。另外get的column於cell是分開的。而Scan是2者結合在一起的。 另外Scan中FILTER過濾“PrefixFilter”關鍵字是用來篩選rowkey的。
4. get於Scan獲取table中單條數據信息中的區別
《相同點》
hbase(main):229:0> get "liupeng:employee",'1001','info:phone' COLUMN CELL info:phone timestamp=1527914569028, value=15962459503 1 row(s) in 0.0320 seconds hbase(main):230:0> scan "liupeng:employee",FILTER=>"PrefixFilter('1001')AND ValueFilter(=,'substring:159')" ROW COLUMN+CELL 1001 column=info:phone, timestamp=1527914569028, value=15962459503 1 row(s) in 0.1010 seconds
《不同點》
##注意事項:上述都可以把table中rowkey爲1002,元素爲'159'的信息查詢出來。但是查詢的方式截然不同。get是通過指定固定的value 'contect:phone'來獲取到的。
而scan是通過PerfixFilter指定固定的rowkey,然後通過AND條件語句結合ValueFilter指定模糊查詢的字符串159查出來的。如果不知道對應的value是contect:phone的基礎上
顯然Scan這種模糊查詢的方式更加高效。
另外Scan下面這種相同語句的查詢用get語法是做不到的。例如:
=================================================================================================================
hbase(main):026:0> scan 'liupeng:employee',FILTER=>"ValueFilter(=,'substring:159')" ROW COLUMN+CELL 1001 column=contect:phone, timestamp=1522202430196, value=15962459503 1002 column=contect:phone, timestamp=1522202527866, value=15977634464
##解釋:上述是通過模糊查詢直接找到了只要包含159的字段的值就全部顯示出來。而get的語法如下所視必須指定rowkey的基礎上纔可以查詢columns。這就需要對rowkey定義的時候
考慮全面的涉及纔可以做到。因此從這點來看Scan的方法個人認爲比get獲取信息更加的便捷。
hbase> t.get 'r1' hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]} hbase> t.get 'r1', {COLUMN => 'c1'} hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> t.get 'r1', 'c1' hbase> t.get 'r1', 'c1', 'c2' hbase> t.get 'r1', ['c1', 'c2']
5. Scan方法可以不用指定rowkey檢索的情況下直接找valuse值。更具體點說也就是我們要找的哪個column中的哪個value值。get方法是無法做到這一點的。
ColumnPrefixFilter('列名')
hbase(main):038:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('name')" ROW COLUMN+CELL 1001 column=info:name, timestamp=1522202364156, value=liupeng 1002 column=info:name, timestamp=1522202474669, value=Jack_Ma 1003 column=info:name, timestamp=1522202561029, value=kevin_shi 3 row(s) in 0.0210 seconds ##註釋:ColumnPrefixFilter代表指定具體哪一個column(key(info)對應的value(name))。
6. Scan方法方便在於它可以隨意指定rowkey,column以及value的值來進行查找。還可以結合AND,ORD等條件語句並用來找到自己想要的數據。
下列語法是AND及OR的連用方法。但是同一條語句中相同的條件語句不可以同時使用。例如AND ....AND..這種方法是不允許的。
hbase(main):060:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('ph')AND ValueFilter(=,'substring:15962')OR ValueFilter(=,'substring:186')" ROW COLUMN+CELL 1001 column=contect:phone, timestamp=1522202430196, value=15962459503 1003 column=contect:phone, timestamp=1522202605976, value=18665851263 2 row(s) in 0.0170 seconds
7. 通過SingleColumnValueFilter類方法指定檢索值列舉出檢索值對應的所有列及value數據
hbase(main):242:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:30')"} ROW COLUMN+CELL 1005 column=contect:mail, timestamp=1528420218800, [email protected] 1005 column=info:age, timestamp=1528439967493, value=30 1005 column=info:name, timestamp=1528420218800, value=zhangsan 1008 column=contect:mail, timestamp=1528681786126, [email protected] 1008 column=info:age, timestamp=1528681786126, value=30 1008 column=info:name, timestamp=1528681786126, value=kevin 2 row(s) in 0.0110 seconds
8. SingleColumnValueFilter類還提供正則表達式查詢方法。可以通過模糊查詢來查找對應的rowkeys,columns以及values。
hbase(main):244:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','name',=,'regexstring:liu')"} ROW COLUMN+CELL 1001 column=contect:mail, timestamp=1527231141046, [email protected] 1001 column=info:address, timestamp=1527753987327, value=shanghai 1001 column=info:age, timestamp=1527231097033, value=34 1001 column=info:name, timestamp=1527231081262, value=liupeng 1001 column=info:phone, timestamp=1527914569028, value=15962459503 1004 column=contect:mail, timestamp=1527473497956, [email protected] 1004 column=info:address, timestamp=1527755135174, value=shenzhen 1004 column=info:age, timestamp=1527473477124, value=40 1004 column=info:name, timestamp=1527415665182, value=liuqiangdong 2 row(s) in 0.0080 seconds