第一步,啓動hadoop,命令:./start-all.sh
第二步,啓動hive,命令:
./hive --auxpath /home/dream-victor/hive-0.6.0/lib/hive_hbase-handler.jar,/home/dream-victor/hive-0.6.0/lib/hbase-0.20.3.jar,/home/dream-victor/hive-0.6.0/lib/zookeeper-3.2.2.jar -hiveconf hbase.master=127.0.0.1:60000
如果已經將jar包路徑添加到hive-en.sh中的HIVE_AUX_JARS_PATH;可以直接用hive -hiveconf hbase.master=127.0.0.1:60000來運行
這裏,-hiveconf hbase.master=指向自己在hbase-site.xml中hbase.master的值
第三步,啓動hbase,命令:./start-hbase.sh
第四步,建立關聯表,這裏我們要查詢的表在hbase中已經存在所以,使用CREATE EXTERNAL TABLE來建立,如下:
- CREATE EXTERNAL TABLE hbase_table_2(key string, value string)
- STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
- WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:1")
- TBLPROPERTIES("hbase.table.name" = "test");
hbase.columns.mapping指向對應的列族;多列時,data:1,data:2;多列族時,data1:1,data2:1;裏面的:key 是固定值而且要保證在表pokes中的foo字段是唯一值
hbase.table.name指向對應的表;
hbase_table_2(key string, value string),這個是關聯表
我們看一下HBase中要查詢的表的結構,
- hbase(main):001:0> describe 'test'
- DESCRIPTION ENABLED
- {NAME => 'test', FAMILIES => [{NAME => 'data', COMPRESSION => 'NONE', true
- VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
- => 'false', BLOCKCACHE => 'true'}]}
- 1 row(s) in 0.0810 seconds
- hbase(main):002:0>
在看一下表中的數據,
- hbase(main):002:0> scan 'test'
- ROW COLUMN+CELL
- row1 column=data:1, timestamp=1300847098583, value=value1
- row12 column=data:1, timestamp=1300849056637, value=value3
- row2 column=data:2, timestamp=1300847106880, value=value2
- 3 row(s) in 0.0160 seconds
- hbase(main):003:0>
列族:data:1、data:2兩個
Key:row1、row12、row2
value:value1、value3、value2
hbase_table_2(key string, value string)中對應的test表中的row,value字段對應的是test表中的value
OK,現在可以來看看查詢結果了,
我們在hive命令行中先查看一下hbase_table_2,
- hive> select * from hbase_table_2;
- OK
- row1 value1
- row12 value3
- Time taken: 0.197 seconds
- hive>
對比一下test表中的列族爲data:1的數據,
- row1 column=data:1, timestamp=1300847098583, value=value1
- row12 column=data:1, timestamp=1300849056637, value=value3
和查詢結果相符,沒問題,然後我們在hbase中在給列族data:1新增一條數據,
- hbase(main):003:0> put 'test','row13','data:1','value4'
- 0 row(s) in 0.0050 seconds
- hbase(main):004:0>
再查看hbase_table_2表,
- hive> select * from hbase_table_2;
- OK
- row1 value1
- row12 value3
- row13 value4
- Time taken: 0.165 seconds
- hive>
新增數據value4出現了,說明可以通過hbase_table_2查詢hbase的test表
下面我們來查詢一下test表中value值爲value3的數據,
- hive> select * From hbase_table_2 where value='value3';
- Total MapReduce jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks is set to 0 since there's no reduce operator
- Starting Job = job_201103231022_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201103231022_0001
- Kill Command = /home/dream-victor/hadoop-0.20.2/bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201103231022_0001
- 2011-03-23 11:23:27,807 Stage-1 map = 0%, reduce = 0%
- 2011-03-23 11:23:30,824 Stage-1 map = 100%, reduce = 0%
- 2011-03-23 11:23:33,854 Stage-1 map = 100%, reduce = 100%
- Ended Job = job_201103231022_0001
- OK
- row12 value3
- Time taken: 11.929 seconds
- hive>
和hbase的test表對比一下,
- row12 column=data:1, timestamp=1300849056637, value=value3
OK,這樣我們就可以使用SQL來對hbase進行查詢了。
以上只是在命令行裏左對應的查詢,我們的目的是使用JAVA代碼來查詢出有用的數據,其實這個也很簡單,
首先,啓動Hive的命令有點變化,使用如下命令:
- ./hive --service hiveserver
這裏我們默認使用嵌入的Derby數據庫,這裏可以在hive-site.xml文件中查看到:
- <property>
- <name>javax.jdo.option.ConnectionURL</name>
- <value>jdbc:derby:;databaseName=metastore_db;create=true</value>//指定了數據庫默認的名字和地址
- </property>
- <property>
- <name>javax.jdo.option.ConnectionDriverName</name>
- <value>org.apache.derby.jdbc.EmbeddedDriver</value>
- </property>
在此,數據庫鏈接的URL可以使用默認的:jdbc:hive://localhost:10000/default
有了上面的準備,下面我們就可以使用JAVA代碼來讀取數據了,如下:
- public class HiveTest extends TestCase {
- private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
- private Connection con;
- private boolean standAloneServer = true;
- public void testSelect() throws SQLException {
- Statement stmt = con.createStatement();
- ResultSet res = stmt.executeQuery("select * from hbase_table_2");
- boolean moreRow = res.next();
- while (moreRow) {
- System.out.println(res.getString(1)+","+res.getString(2));
- moreRow = res.next();
- }
- }
- @Override
- protected void setUp() throws Exception {
- super.setUp();
- Class.forName(driverName);
- con = DriverManager.getConnection(
- "jdbc:hive://localhost:10000/default", "", "");
- }
- }
結果,
- row1,value1
- row12,value3
- row13,value4
- row14,test
查看一下hbase中的結果,
- ROW COLUMN+CELL
- row1 column=data:1, timestamp=1300847098583, value=value1
- row12 column=data:1, timestamp=1300849056637, value=value3
- row13 column=data:1, timestamp=1300850443699, value=value4
- row14 column=data:1, timestamp=1300867550502, value=test
OK,完美了,不過還是希望這樣的需求少一點,畢竟Hbase產生的初衷不是爲了支持結構化查詢。