【解決方案】【Hive】Hive壓縮文件格式轉換方案

目標：將Hive中已經存在的Lzo壓縮格式錶轉換爲Orc格式，並保證數據不丟失

執行與測試過程：

1. 創建lzo相關表：（驗證過程，可忽略）

create external table test_lzo(

id int

)partitioned by(`date_par` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

OUTPUT FORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

2. 插入數據：（驗證過程，可忽略）

insert into table test_lzo partition (date_par='20190820') values(111);

insert into table test_lzo partition (date_par='20190820') values(222);

insert into table test_lzo partition (date_par='tttt') values(123);

3. 查看數據需要設置參數：（驗證過程，可忽略）

set hive.exec.compress.output = true;

set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

4. 創建替換表, 指定路徑爲原表路徑：

create external table test_lzo_new(

id int)partitioned by(`date_par` string)

STORED AS ORCFILE

LOCATION

'hdfs://hacluster/user/hive/warehouse/test_lzo';

5. 針對分區表需要開啓特定參數：（分區表需要關注）

--允許使用動態分區可通過set hive.exec.dynamic.partition;查看

set hive.exec.dynamic.partition=true;

--當需要設置所有列爲dynamic時需要這樣設置

set hive.exec.dynamic.partition.mode=nonstrict;

--如果分區總數超過這個數量會報錯

set hive.exec.max.dynamic.partitions=100000;

--單個MR Job允許創建分區的最大數量

set hive.exec.max.dynamic.partitions.pernode=100000;

6. 使用insert overwrite 語句將數據重新插入到替換表：

分區表：insert overwrite table test_lzo_new partition(date_par) select * from test_lzo;

非分區表：insert overwrite table test_lzo_new select * from test_lzo;

插入後新表分區數據正常，舊錶數據被替換

7. 刪除舊錶：

drop table test_lzo_new;

8. 重新命名新表：

alter table test_lzo_new rename to test_lzo;

9. 全部刪除後，如果有必要，刪除有關lzo相關的配置：（刪除後需要hive重啓）

core-hive.xml

<property>

<name>io.compression.codecs</name>

<value>com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>

</property>

<property>

<name>io.compression.codec.lzo.class</name>

<value>com.hadoop.compression.lzo.LzoCodec</value>

</property>

hive-site.xml

<name>hive.aux.jars.path</name>

file:///opt/huawei/Bigdata/hive-0.13.1/hive-0.13.1/lib/hadoop-lzo-0.4.15.jar

參考：https://note4code.com/2016/03/06/%E4%B8%80%E7%A7%8D-hive-%E8%A1%A8%E5%AD%98%E5%82%A8%E6%A0%BC%E5%BC%8F%E7%9A%84%E8%BD%AC%E6%8D%A2%E7%9A%84%E6%96%B9%E5%BC%8F/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【解決方案】【Hive】Hive壓縮文件格式轉換方案

【項目】仿知乎網站項目梳理

Hadoop權威指南：知識梳理（一）

【Hive】Hive窗口函數

《SRE Google運維解密》讀書筆記

【Hive】Hive補充（HcatLog、CBO、壓縮方式）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結