問題背景:使用hive時,有時候會碰到數據源是csv格式的文本。如果直接加載進hive,也會把csv的表頭【schema】添加到hive中,形成髒數據。
解決辦法如下:
hive 0.13版本新特性
:
- 建表時加入 tblproperties
tblproperties(
"skip.header.line.count"="n", -- 跳過文件前n行
"skip.footer.line.count"="n" -- 跳過文件後n行
)
- 例子
CREATE TABLE `ods_callcenter.sdm_call`(
`id` string,
`call_uuid` string,
`transaction_id` string,
`type` string,
`status` string,
`caller_number` string,
`dest_number` string,
`call_create_time` string,
`call_answer_time` string,
`call_hangup_time` string,
`ring_time` string,
`duration` string,
`bill_sec` string,
`talk_time` string,
`hangup_cause` string,
`hangup_cause_detail` string,
`record_key` string,
`read_record_key` string,
`write_record_key` string,
`create_time` string,
`update_time` string)
PARTITIONED BY (`dt` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
- 上傳文件
load data local inpath '/home/yqg/liuzhiwei/query-hive-199215.csv' into table call Partition (dt='20200601');
select * from ods_callcenter.sdm_call where dt = 'xxxx' limit 1;