Hive SQL上傳csv文件忽略第一行

原創

流风雨情

2020-06-03 14:21

問題背景：使用hive時，有時候會碰到數據源是csv格式的文本。如果直接加載進hive，也會把csv的表頭【schema】添加到hive中，形成髒數據。

解決辦法如下：

hive 0.13版本新特性：

建表時加入 tblproperties

 tblproperties(
"skip.header.line.count"="n",  -- 跳過文件前n行
"skip.footer.line.count"="n"   -- 跳過文件後n行
)

例子

CREATE TABLE `ods_callcenter.sdm_call`(
	  `id` string, 
	  `call_uuid` string, 
	  `transaction_id` string, 
	  `type` string, 
	  `status` string, 
	  `caller_number` string, 
	  `dest_number` string, 
	  `call_create_time` string, 
	  `call_answer_time` string, 
	  `call_hangup_time` string, 
	  `ring_time` string, 
	  `duration` string, 
	  `bill_sec` string, 
	  `talk_time` string, 
	  `hangup_cause` string, 
	  `hangup_cause_detail` string, 
	  `record_key` string, 
	  `read_record_key` string, 
	  `write_record_key` string, 
	  `create_time` string, 
	  `update_time` string)
	PARTITIONED BY (`dt` string)
	  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
	  STORED AS TEXTFILE 
	  TBLPROPERTIES("skip.header.line.count"="1");

上傳文件

load data local inpath '/home/yqg/liuzhiwei/query-hive-199215.csv' into table call Partition (dt='20200601');

select * from ods_callcenter.sdm_call where dt = 'xxxx' limit 1;

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive SQL上傳csv文件忽略第一行

解決辦法如下：

SQL優化-20231016

superset css自定義樣式

MySQL redo log（重做日誌）和 binlog（歸檔日誌）

拉鍊表解決什麼場景下問題，爲什麼要用拉線表

Java對象頭打印JDk1.8 64位

Flink 自定義MySQL sink簡單例子

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結