目標:將Hive中已經存在的Lzo壓縮格式錶轉換爲Orc格式,並保證數據不丟失
執行與測試過程:
1. 創建lzo相關表:(驗證過程,可忽略)
create external table test_lzo(
id int
)partitioned by(`date_par` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUT FORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
2. 插入數據:(驗證過程,可忽略)
insert into table test_lzo partition (date_par='20190820') values(111);
insert into table test_lzo partition (date_par='20190820') values(222);
insert into table test_lzo partition (date_par='tttt') values(123);
3. 查看數據需要設置參數:(驗證過程,可忽略)
set hive.exec.compress.output = true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
4. 創建替換表, 指定路徑爲原表路徑:
create external table test_lzo_new(
id int)partitioned by(`date_par` string)
STORED AS ORCFILE
LOCATION
'hdfs://hacluster/user/hive/warehouse/test_lzo';
5. 針對分區表需要開啓特定參數:(分區表需要關注)
--允許使用動態分區可通過set hive.exec.dynamic.partition;查看
set hive.exec.dynamic.partition=true;
--當需要設置所有列爲dynamic時需要這樣設置
set hive.exec.dynamic.partition.mode=nonstrict;
--如果分區總數超過這個數量會報錯
set hive.exec.max.dynamic.partitions=100000;
--單個MR Job允許創建分區的最大數量
set hive.exec.max.dynamic.partitions.pernode=100000;
6. 使用insert overwrite 語句將數據重新插入到替換表:
- 分區表:insert overwrite table test_lzo_new partition(date_par) select * from test_lzo;
- 非分區表:insert overwrite table test_lzo_new select * from test_lzo;
插入後新表分區數據正常,舊錶數據被替換
7. 刪除舊錶:
drop table test_lzo_new;
8. 重新命名新表:
alter table test_lzo_new rename to test_lzo;
9. 全部刪除後,如果有必要,刪除有關lzo相關的配置:(刪除後需要hive重啓)
core-hive.xml
<property>
<name>io.compression.codecs</name>
<value>com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
hive-site.xml
<name>hive.aux.jars.path</name>
file:///opt/huawei/Bigdata/hive-0.13.1/hive-0.13.1/lib/hadoop-lzo-0.4.15.jar