關於hive分區，你知道多少呢？

文末查看關鍵字，回覆贈書

一、理論基礎

1.Hive分區背景

在Hive Select查詢中一般會掃描整個表內容，會消耗很多時間做沒必要的工作。有時候只需要掃描表中關心的一部分數據，因此建表時引入了partition概念。

2.Hive分區實質

因爲Hive實際是存儲在HDFS上的抽象，Hive的一個分區名對應hdfs的一個目錄名，並不是一個實際字段。

3.Hive分區的意義

輔助查詢，縮小查詢範圍，加快數據的檢索速度和對數據按照一定的規格和條件進行查詢，更方便數據管理。

4.常見的分區技術

hive表中的數據一般按照時間、地域、類別等維度進行分區。

二、單分區操作

1.創建分區表

create table if not exists t1(
    id      int
   ,name    string
   ,hobby   array
   ,add     map
)
partitioned by (pt_d string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;

注：這裏分區字段不能和表中的字段重複。
如果分區字段和表中字段相同的話，會報錯，如下：

create table t10(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     maptring,string>
)
partitioned by (id int)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;

報錯信息：FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

2.裝載數據

需要加載的文件內容如下：

1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin

執行load data

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1 partition ( pt_d = '201701');

3.查看數據及分區

查看分區數據,使用和字段使用一致。

select * from t1 where pt_d = '201701';

結果

1   xiaoming    ["book","TV","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  201701
2   lilei   ["book","code"] {"nanjing":"jiangning","taiwan":"taibei"}   201701
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  201701

查看分區

show partitions t1;

4.插入另一個分區

再創建一份數據並裝載，分區=‘000000’

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1 partition ( pt_d = '000000');

查看數據：

select * from t1;

1   xiaoming    ["book","TV","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  000000
2   lilei   ["book","code"] {"nanjing":"jiangning","taiwan":"taibei"}   000000
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  000000
1   xiaoming    ["book","TV","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  201701
2   lilei   ["book","code"] {"nanjing":"jiangning","taiwan":"taibei"}   201701
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  201701

5.觀察HDFS上的文件

去hdfs上看文件

http://namenode:50070/explorer.html#/user/hive/warehouse/test.db/t1

可以看到，文件是根據分區分別存儲，增加一個分區就是一個文件。

查詢相應分區的數據

select * from t1 where pt_d = ‘000000’

添加分區，增加一個分區文件

alter table t1 add partition (pt_d = ‘333333’);

刪除分區(刪除相應分區文件)

注意，對於外表進行drop partition並不會刪除hdfs上的文件，並且通過msck repair table table_name可以同步回hdfs上的分區。

alter table test1 drop partition (pt_d = ‘20170101’);

三、多個分區操作

1.創建分區表

create table t10(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     maptring,string>
)
partitioned by (pt_d string,sex string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;

2.加載數據(分區字段必須都要加)

load data local inpath ‘/home/hadoop/Desktop/data’ overwrite into table t10 partition ( pt_d = ‘0’);

如果只是添加一個，會報錯：FAILED: SemanticException [Error 10006]: Line 1:88 Partition not found ”0”

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t10 partition ( pt_d = '0',sex='male');
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t10 partition ( pt_d = '0',sex='female');

觀察HDFS上的文件，可發現多個分區具有順序性，可以理解爲windows的樹狀文件夾結構。

四、表分區的增刪修查
1.增加分區
這裏我們創建一個分區外部表

create external table testljb (
    id int
) partitioned by (age int);

添加分區

官網說明：

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];

partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

實例說明

一次增加一個分區

alter table testljb add partition (age=2);

一次增加多個同級（分區名相同）分區

alter table testljb add partition(age=3) partition(age=4);

注意：一定不能寫成如下方式：

alter table testljb add partition(age=5,age=6);

如果我們show partitions table_name 會發現僅僅添加了age=6的分區。

這裏猜測原因：因爲這種寫法實際上：具有多個分區字段表的分區添加，而我們寫兩次同一個字段，而系統中並沒有兩個age分區字段，那麼就會隨機添加其中一個分區。

父子級分區增加：

舉個例子，有個表具有兩個分區字段：age分區和sex分區。那麼我們添加一個age分區爲1，sex分區爲male的數據，可以這樣添加：

alter table testljb add partition(age=1,sex='male');

2.刪除分區

刪除分區age=1

alter table testljb drop partition(age=1);

注：加入表testljb有兩個分區字段（上文已經提到多個分區先後順序類似於windows的文件夾的樹狀結構），partitioned by(age int ,sex string)，那麼我們刪除age分區（第一個分區）時，會把該分區及其下面包含的所有sex分區一起刪掉。

3.修復分區

修復分區就是重新同步hdfs上的分區信息。

msck repair table table_name;

4.查詢分區

show partitions table_name;

上一篇：數據倉庫與數據集市建模

下期預告：hive的動態分區與靜態分區

按例，我的個人公衆號：魯邊社，歡迎關注

後臺回覆關鍵字 [hive]，隨機贈送一本魯邊備註版珍藏大數據書籍。

關於hive分區，你知道多少呢？

一、理論基礎

1.Hive分區背景

2.Hive分區實質

3.Hive分區的意義

4.常見的分區技術

二、單分區操作

1.創建分區表

2.裝載數據

3.查看數據及分區

4.插入另一個分區

5.觀察HDFS上的文件

三、多個分區操作

1.創建分區表

2.加載數據(分區字段必須都要加)

2.刪除分區

3.修復分區

4.查詢分區

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

大數據面試SQL每日一題系列：最高峯同時在線主播人數。字節，快手等大廠高頻面試題

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

什麼是SQL 語句中相關子查詢與非相關子查詢

SQL窗口分析函數使用詳解系列三之偏移量類窗口函數

實時數倉構建：Flink+OLAP查詢的一些實踐與思考

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

關於hive分區，你知道多少呢？

一、理論基礎

1.Hive分區背景

2.Hive分區實質

3.Hive分區的意義

4.常見的分區技術

二、單分區操作

1.創建分區表

2.裝載數據

3.查看數據及分區

4.插入另一個分區

5.觀察HDFS上的文件

三、多個分區操作

1.創建分區表​​​​​​​

2.加載數據(分區字段必須都要加)

2.刪除分區

3.修復分區

4.查詢分區

1.創建分區表