Hive分區

一、理論基礎

1、Hive分區背景

在Hive Select查詢中一般會掃描整個表內容，會消耗很多時間做沒必要的工作。有時候只需要掃描表中關心的一部分數據，因此建表時引入了partition概念。

2、Hive分區實質

因爲Hive實際是存儲在HDFS上的抽象，Hive的一個分區名對應hdfs的一個目錄名，並不是一個實際字段。

3、Hive分區的意義

輔助查詢，縮小查詢範圍，加快數據的檢索速度和對數據按照一定的規格和條件進行管理。

4、常見的分區技術

hive表中的數據一般按照時間、地域、類別等維度進行分區。

二、分區操作

(一)、靜態分區

1、單分區

(1)創建表

hive> create table student(id string,name string) partitioned by(classRoom string) row format delimited fields terminated by ',';
OK
Time taken: 0.259 seconds

注意：partitioned by()要放在row format...的前面；partitioned by()裏面的分區字段不能和表中的字段重複，否則報錯；

(2)加載數據

hive> load data local inpath '/home/test/stu.txt' into table student partition(classroom='002');
Loading data to table default.student partition (classroom=002)
OK
Time taken: 1.102 seconds

(3)查看分區

hive> show partitions student;
OK
classroom=002
Time taken: 0.071 seconds, Fetched: 1 row(s)

(4)hdfs中分區展示

(5)再加載一組數據到新的分區

hive> load data local inpath '/home/test/stu.txt' into table student partition(classroom='003');
Loading data to table default.student partition (classroom=003)
OK
Time taken: 0.722 seconds
hive> select * from student;
OK
001	xiaohong	002
002	xiaolan	002
001	xiaohong	003
002	xiaolan	003
Time taken: 0.097 seconds, Fetched: 4 row(s)
hive> show partitions student;
OK
classroom=002
classroom=003
Time taken: 0.071 seconds, Fetched: 2 row(s)

2、多分區

(1)創建表

hive> create table stu(id string,name string) partitioned by(school string,classRoom string) row format delimited fields terminated by ',';
OK
Time taken: 0.074 seconds

hive> desc stu;
OK
id                  	string              	                    
name                	string              	                    
school              	string              	                    
classroom           	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
school              	string              	                    
classroom           	string              	                    
Time taken: 0.03 seconds, Fetched: 10 row(s)

(2)加載數據

hive> load  data local inpath '/home/test/stu.txt' into table stu partition(school='AA',classroom='005');
Loading data to table default.stu partition (school=AA, classroom=005)
OK
Time taken: 0.779 seconds

hive> select * from stu;
OK
001	xiaohong	AA	005
002	xiaolan	AA	005
Time taken: 0.087 seconds, Fetched: 2 row(s)

(3)查看分區

hive> show partitions stu;
OK
school=AA/classroom=005
Time taken: 0.048 seconds, Fetched: 1 row(s)

注意：這是個嵌套目錄；

(4)hdfs中分區展示

(5)增加數據效果

hive> load  data local inpath '/home/test/stu.txt' into table stu partition(school='BB',classroom='001');
Loading data to table default.stu partition (school=BB, classroom=001)
OK
Time taken: 0.272 seconds
hive> load  data local inpath '/home/test/stu.txt' into table stu partition(school='AA',classroom='001');
Loading data to table default.stu partition (school=AA, classroom=001)
OK
Time taken: 0.268 seconds

(二)、動態分區

靜態分區與動態分區的主要區別在於靜態分區是手動指定，而動態分區是通過數據來進行判斷。詳細來說，靜態分區的列實在編譯時期，通過用戶傳遞來決定的；動態分區只有在SQL執行時才能決定。

1、啓用hive動態分區

在hive會話中設置兩個參數：

hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;

2、創建表

(1)首先準備一個帶有靜態分區的表

hive> select * from stu;
OK
001	xiaohong	AA	001
002	xiaolan	AA	001
001	xiaohong	AA	005
002	xiaolan	AA	005
001	xiaohong	BB	001
002	xiaolan	BB	001
Time taken: 0.105 seconds, Fetched: 6 row(s)

(2)copy一張表結構相同的表

hive> create table stu01 like stu;
OK
Time taken: 0.068 seconds
hive> desc stu;
OK
id                  	string              	                    
name                	string              	                    
school              	string              	                    
classroom           	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
school              	string              	                    
classroom           	string              	                    
Time taken: 0.022 seconds, Fetched: 10 row(s)

(3)加載數據，分區成功

不指定具體的學校和班級，讓系統自動分配；

hive> insert overwrite table stu01 partition(school,classroom) 
    > select * from stu;

hive> select * from stu;
OK
001	xiaohong	AA	001
002	xiaolan	AA	001
001	xiaohong	AA	005
002	xiaolan	AA	005
001	xiaohong	BB	001
002	xiaolan	BB	001
Time taken: 0.091 seconds, Fetched: 6 row(s)
hive> select * from stu01;
OK
001	xiaohong	AA	001
002	xiaolan	AA	001
001	xiaohong	AA	005
002	xiaolan	AA	005
001	xiaohong	BB	001
002	xiaolan	BB	001
Time taken: 0.081 seconds, Fetched: 6 row(s)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive分區

一、理論基礎

1、Hive分區背景

2、Hive分區實質

3、Hive分區的意義

4、常見的分區技術

二、分區操作

(一)、靜態分區

1、單分區

2、多分區

(二)、動態分區

1、啓用hive動態分區

2、創建表

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

sqoop的安裝及簡單使用

Flume單機安裝及測試

kafka+sparkStreaming+mysql

命令查看yarn當前任務列表

Es爲Hbase創建二級索引思路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結