Hbase通過Phoenix創建二級索引

Phoenix的索引介紹:http://phoenix.apache.org/secondary_indexing.html

1 全局索引

全局索引適合重讀輕寫業務,官網介紹如下:

Global indexing targets read heavy uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETEUPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. By default, unless hinted, an index will not be used for a query that references a column that isn’t part of the index.

前期工作準備

CREATE TABLE test.test_salt (id INTEGERPRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR) SALT_BUCKETS = 20;

UPSERT INTO TEST.TEST_SALT VALUES(1, 'test01', 22, 'PEK');
UPSERT INTO TEST.TEST_SALT VALUES(2, 'test02', 18, 'SHA');

創建一個全局索引

CREATE INDEX testsalt_idx ON test.TEST_SALT(name, age);

通過索引查詢數據

0: jdbc:phoenix:hadoop002:2181> select * from TEST.testsalt_idx;
+---------+--------+------+
| 0:NAME  | 0:AGE  | :ID  |
+---------+--------+------+
| test02  | 18     | 2    |
| test01  | 22     | 1    |
+---------+--------+------+
2 rows selected (0.862 seconds)

查詢結果column上的0表示默認字段會放在列族0上

hbase(main):001:0> desc 'TEST:TEST_SALT'
Table TEST:TEST_SALT is ENABLED                                                                                                                                         
TEST:TEST_SALT, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocesso
r.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apa
che.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.c
lass=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                                             
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                               
1 row(s) in 0.2980 seconds

刪除一個全局索引

DROP INDEX testsalt_idx ON TEST.TEST_SALT ;

如何有效的使用索引,如下的查詢都是FULL scan

0: jdbc:phoenix:hadoop002:2181> explain  SELECT * FROM TEST.TEST_SALT
. . . . . . . . . . . . . . . > ;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT  | null            | null           | null         |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.03 seconds)
0: jdbc:phoenix:hadoop002:2181>  explain SELECT /* + INDEX(TEST.TEST_SALT  TEST.TESTSALT_IDX) */ * FROM TEST.TEST_SALT ;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT  | null            | null           | null         |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT  | null            | null           | null         |
|     SERVER FILTER BY AGE = 18                                              | null            | null           | null         |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
0: jdbc:phoenix:hadoop002:2181> explain SELECT name, ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT  | null            | null           | null         |
|     SERVER FILTER BY AGE = 18                                              | null            | null           | null         |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+

只有查詢列完全被包含在索引中,纔會走索引查詢

0: jdbc:phoenix:hadoop002:2181> explain SELECT name FROM TEST.TEST_SALT WHERE AGE = 18;
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                           PLAN                                            | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER TEST:TESTSALT_IDX [0] - [19]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                            | null            | null           | null         |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.037 seconds)
0: jdbc:phoenix:hadoop002:2181> 

強制走索引

0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT TESTSALT_IDX) */ ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                                                | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT                         | null            | null           | null         |
|     SKIP-SCAN-JOIN TABLE 0                                                                        | null            | null           | null         |
|         CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER TEST:TESTSALT_IDX [0] - [19]  | null            | null           | null         |
|             SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                            | null            | null           | null         |
|     DYNAMIC SERVER FILTER BY "TEST.TEST_SALT.ID" IN ($12.$14)                                     | null            | null           | null         |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.128 seconds)

總結:全局索引本質是一張表,在使用過程中,WHERE和SELECT字段要全部索引字段,或者WHERE字段要全部索引字段SELECT不全部要強制hint 使用索引。

2 本地索引

本地索引適合重寫輕讀的業務,在官網的介紹:

Local indexing targets write heavyspace constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global indexes, all local indexes of a table are stored in a single, separate shared table prior to 4.8.0 version. From 4.8.0 onwards we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time.

準備數據

CREATE TABLE test.test_local (id INTEGER PRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR) split ON (1,2,3);
UPSERT INTO TEST.test_local VALUES(1, 'test01', 22, 'PEK');
UPSERT INTO TEST.test_local VALUES(2, 'test02', 18, 'SHA');

創建本地索引

 CREATE LOCAL INDEX testlocal_idx ON test.TEST_LOCAL(name, age);

本地索引未建之前

hbase(main):002:0> desc 'TEST:TEST_LOCAL'
Table TEST:TEST_LOCAL is ENABLED                                                                                                                                        
TEST:TEST_LOCAL, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocess
or.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.ap
ache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.
class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}                                                           
COLUMN FAMILIES DESCRIPTION                                                                                                                                             
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                               
1 row(s) in 0.2970 seconds

本地索引創建之後

hbase(main):003:0> desc 'TEST:TEST_LOCAL'
Table TEST:TEST_LOCAL is ENABLED                                                                                                                                        
TEST:TEST_LOCAL, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocess
or.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.ap
ache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.
class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder', coprocessor$6 => '|org.apache.hadoop.hbase.regionserver.In
dexHalfStoreFileReaderGenerator|805306366|', METADATA => {'DATA_TABLE_NAME' => 'TEST.TEST_LOCAL', 'SPLIT_POLICY' => 'org.apache.phoenix.hbase.index.IndexRegionSplitPoli
cy'}}                                                                                                                                                                   
COLUMN FAMILIES DESCRIPTION                                                                                                                                             
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                               
{NAME => 'L#0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPR
ESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                            
2 row(s) in 0.0310 seconds

由此可以看出local index的設計方式是多了一個column family L#0,這就是本地索引與全局索引底層的區別。本地索引不是一個表,是所屬表的一個列族,在hbase shell的list命令中看不到。索引數據直接寫在原表rowkey中,列族不寫任何實際信息。

hbase(main):005:0> scan 'TEST:TEST_LOCAL'
ROW                                         COLUMN+CELL                                                                                                                 
 \x80\x00\x00\x01                           column=0:\x00\x00\x00\x00, timestamp=1585991153644, value=x                                                                 
 \x80\x00\x00\x01                           column=0:\x80\x0B, timestamp=1585991153644, value=test01                                                                    
 \x80\x00\x00\x01                           column=0:\x80\x0C, timestamp=1585991153644, value=\x80\x00\x00\x16                                                          
 \x80\x00\x00\x01                           column=0:\x80\x0D, timestamp=1585991153644, value=PEK                                                                       
 \x80\x00\x00\x01\x00\x00test01\x00\xC1\x17 column=L#0:\x00\x00\x00\x00, timestamp=1585991153644, value=\x00\x00\x00\x00                                                
 \x00\x80\x00\x00\x01                                                                                                                                                   
 \x80\x00\x00\x02                           column=0:\x00\x00\x00\x00, timestamp=1585991167298, value=x                                                                 
 \x80\x00\x00\x02                           column=0:\x80\x0B, timestamp=1585991167298, value=test02                                                                    
 \x80\x00\x00\x02                           column=0:\x80\x0C, timestamp=1585991167298, value=\x80\x00\x00\x12                                                          
 \x80\x00\x00\x02                           column=0:\x80\x0D, timestamp=1585991167298, value=SHA                                                                       
 \x80\x00\x00\x02\x00\x00test02\x00\xC1\x13 column=L#0:\x00\x00\x00\x00, timestamp=1585991167298, value=\x00\x00\x00\x00                                                
 \x00\x80\x00\x00\x02                                                                                                                                                   
4 row(s) in 0.0970 seconds

hbase(main):006:0> 

通過查看錶的數據來分析local index的rowkey的設計方位是(以上述的第二個本地索引爲例):
\x80\x00\x00\x02 +   \x00  +  \x00test02\ + x00 + \xC1\x13 + \x00 + \x80\x00\x00\x02
原數據region的start key+"\x00"+二級索引字段1+"\x00"+二級索引字段2(複合索引)…+"\x00"+原rowkey
第一條信息"原數據region的start key",這樣做的目的是保證索引數據和原數據在一個region上,定位到二級索引後根據原rowkey就可以很快在本region上獲取到其它信息,減少網絡開銷和檢索成本。
2.查詢的時候,會在不同的region裏面分別對二級索引字段定位,查到原rowkey後在本region上獲取到其它信息
3.因爲這種索引設計方式只寫索引數據,省了不少空間的佔用,根據索引信息拿到原rowkey後再根據rowkey到原數據裏面獲取其它信息。所以這種表的應用場景定位是重寫輕讀的業務。
通過查詢計劃看索引的使用

0: jdbc:phoenix:hadoop002:2181> explain  SELECT * FROM TEST.TEST_LOCAL ;
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                   PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_LOCAL  | null            | null           | null         |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.039 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT  TEST.TESTSALT_IDX) */ * FROM TEST.TEST_LOCAL ;
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                   PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_LOCAL  | null            | null           | null         |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                      PLAN                                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                 | null            | null           | null         |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.039 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT name, ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                      PLAN                                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                 | null            | null           | null         |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.03 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT TESTSALT_IDX) */ ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                      PLAN                                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                 | null            | null           | null         |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.033 seconds)
0: jdbc:phoenix:hadoop002:2181> 

總結:要有where條件,且字段不要以*,實際上是沒有index table,所以只要滿足上述條件就會走索引。

3 覆蓋索引

原始列的數據也存在index table中,查詢數據不需要掃描原始表,直接取結果,節省開銷 
準備數據

CREATE TABLE test.test_include (id INTEGER PRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR, sal integer) split ON (1,2,3);

UPSERT INTO TEST.test_include VALUES(1, 'test01', 22, 'PEK', 3300);
UPSERT INTO TEST.test_include VALUES(2, 'test02', 18, 'SHA', 3200);

創建索引:

CREATE LOCAL INDEX testinclude_idx ON test.test_include(name, age) include(address);

索引在查詢時的使用

0: jdbc:phoenix:hadoop002:2181> explain  SELECT * FROM TEST.TEST_INCLUDE ;
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                     | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_INCLUDE  | null            | null           | null         |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE  TEST.TESTINCLUDE_IDX) */ SAL FROM TEST.TEST_INCLUDE ;
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                    PLAN                                     | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_INCLUDE  | null            | null           | null         |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.032 seconds)
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1]  | null            | null           | null         |
|     SERVER FILTER BY TO_INTEGER("AGE") = 18                                      | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.026 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT NAME, ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1]  | null            | null           | null         |
|     SERVER FILTER BY TO_INTEGER("AGE") = 18                                      | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE TESTINCLUDE_IDX) */ ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1]  | null            | null           | null         |
|     SERVER FILTER BY TO_INTEGER("AGE") = 18                                      | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.032 seconds)
0: jdbc:phoenix:hadoop002:2181> 
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE TESTINCLUDE_IDX) */ SAL FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                   | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.038 seconds)
0: jdbc:phoenix:hadoop002:2181> 

4 函數索引

unctional indexes (available in 4.3 and above) allow you to create an index not just on columns, but on an arbitrary expressions. Then when a query uses that expression, the index may be used to retrieve the results instead of the data table. For example, you could create an index on UPPER(FIRST_NAME||‘ ’||LAST_NAME) to allow you to do case insensitive searches on the combined first name and last name of a person.

For example, the following would create this functional index:

CREATE INDEX UPPER_NAME_IDX ON EMP (UPPER(FIRST_NAME||' '||LAST_NAME))

With this index in place, when the following query is issued, the index would be used instead of the data table to retrieve the results:

SELECT EMP_ID FROM EMP WHERE UPPER(FIRST_NAME||' '||LAST_NAME)='JOHN DOE'

Phoenix supports two types of indexing techniques: global and local indexing. Each are useful in different scenarios and have their own failure profiles and performance characteristics.

5 可變索引

For non transactional mutable tables, we maintain index update durability by adding the index updates to the Write-Ahead-Log (WAL) entry of the primary table row. Only after the WAL entry is successfully synced to disk do we attempt to make the index/primary table updates. We write the index updates in parallel by default, leading to very high throughput. If the server crashes while we are writing the index updates, we replay the all the index updates to the index tables in the WAL recovery process and rely on the idempotence of the updates to ensure correctness. Therefore, indexes on non transactional mutable tables are only ever a single batch of edits behind the primary table.

It’s important to note several points:

  • For non transactional tables, you could see the index table out of sync with the primary table.
  • As noted above, this is ok as we are only a very small bit behind and out of sync for very short periods
  • Each data row and its index row(s) are guaranteed to to be written or lost - we never see partial updates as this is part of the atomicity guarantees of HBase.
  • Data is first written to the table followed by the index tables (the reverse is true if the WAL is disabled).

6 不可變索引

For a table in which the data is only written once and never updated in-place, certain optimizations may be made to reduce the write-time overhead for incremental maintenance. This is common with time-series data such as log or event data, where once a row is written, it will never be updated. To take advantage of these optimizations, declare your table as immutable by adding the IMMUTABLE_ROWS=true property to your DDL statement:

CREATE TABLE my_table (k VARCHAR PRIMARY KEY, v VARCHAR) IMMUTABLE_ROWS=true

All indexes on a table declared with IMMUTABLE_ROWS=true are considered immutable (note that by default, tables are considered mutable). For global immutable indexes, the index is maintained entirely on the client-side with the index table being generated as changes to the data table occur. Local immutable indexes, on the other hand, are maintained on the server-side. Note that no safeguards are in-place to enforce that a table declared as immutable doesn’t actually mutate data (as that would negate the performance gain achieved). If that was to occur, the index would no longer be in sync with the table.

If you have an existing table that you’d like to switch from immutable indexing to mutable indexing, use the ALTER TABLE command as show below:

ALTER TABLE my_table SET IMMUTABLE_ROWS=false

Index on non transactional, immutable tables have no mechanism in place to automatically deal with a commit failure. Maintaining consistency between the table and index is left to the client to handle. Because the updates are idempotent, the simplest solution is for the client to continue retrying the batch of mutations until they succeed.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章