hive1.2.2+hadoop2.7.3導入米騎測試日誌以及數據優化(五）

原創

texture_texture

2020-06-25 22:47

Hive是hadoop連接數據庫的一個組件.是一個數據倉庫,提供了Hadoop類sql 的增,刪,改,查.

hive的表一般跟hdfs路徑下的文件對應.hive 的常用命令如下:

啓動:

./bin/hive shell

查看所有表:

show tables;

創建表:

create t_1(a int, b int, c int) row format delimited fields terminated by '\t';

修改表:

alter table t_1 add columns(d String);

導入數據:

load data local inpath '/testdata/words.txt' overwrite into table t_1;

導入hdfs中的文件:

load data inpath 'hdfs://master:9000/testdata/words.txt' overwrite into table t_1;

等等...

下面將米騎測試服務器訪問日誌統計出來的kpi等數據導入進hive的表中.

(1)統計米騎訪問日誌kpi程序下載鏈接:

http://download.csdn.net/detail/cafebar123/9889939

(2)創建hive表

先創建2個表,分別代表訪問ip次數表:t_ip,訪問的上一個跳轉鏈接次數, t_remote_user

然後導入hadoop統計生成的數據,

load data inpath 'hdfs://master:9000/user/hadoop/ipCountOutput/part-r-00000' overwrite into table t_ip;

如圖:

此時,t_ip實際上與ti_ip文件夾互相對應.t_remote_user的處理類似與以上.

(3)表的優化

1)下面試着分區表,並試着把米騎測試服務器的日誌全部導入進表中.

重新創建一個表,並添加一個partition:

create table t_log(ip String,remote_user String,block1 String,local_time String,time_field String,tie_zone String,request_type String,request String,req_status String,resp_status int,body_bytes_sent Sttp_referer String,user_agent String,req_language String) partitioned by(req_month String) row formaited fields terminated by ' ';

共有13個字段,req_month爲partition.

導入日誌數據:

load data inpath 'hdfs://master:9000/user/hadoop/miqiLog10000Input/miqizuche10000.log' overwrite int table t_log partition(req_month=0709);

效果:

錯誤:

ValidationFailureSemanticException table is not partitioned but partition spec exists

這是沒有該分區列導致的.如果在創建表時,沒有創建與分區名一樣的分區列,新增分區時,就會報這bug.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hive1.2.2+hadoop2.7.3導入米騎測試日誌以及數據優化(五）

hive1.2.2+hadoop2.7.3導入米騎測試日誌以及數據優化(五）

java python之間數據交互(使用jython)

安裝流行腳本編輯器(jupyter notebook)流程

spark查詢任意字段,並使用dataframe輸出結果

hive對電商用戶訂單行爲特徵分析(二)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結