Hive DML ，分區表

1.Hive構建在Hadoop之上的數據倉庫
sql ==> Hive ==> MapReduce
但是有些簡單基本的hive不調用mapreduce，就是不帶分組的

2.分組函數：出現在select中的字段，要麼出現在group by子句中，要麼出現在聚合函數中。

3.count(1) and count(字段)
兩者的主要區別是
（1） count(1) 會統計表中的所有的記錄數，包含字段爲null 的記錄。
（2） count(字段) 會統計該字段在表中出現的次數，忽略字段爲null 的情況。即不統計字段爲null 的記錄。

4.(case when then else end ) 類似if-else，返回一列then
的結果.
union all 堆疊

5.看hive裏有哪些函數

hive (default)> show functions;

desc function extended xxx 查看函數功能

轉換某個字段的類型，如果轉換失敗，返回值就是null

cast(value as TYPE)

截取一段字符串，開始位置，截取長度

substr(str,pos,len)

返回以.分割的連接

concat_ws('.','www','asd') 返回www.asd

返回長度,字符串數字都可以

length()

把數組分隔爲多行

explode（）

拆分，以a，d兩種分割符

split('asd.sdf','[a,d]')

用.分割的話要

hive (default)> select split('asd.asd','\\.');
OK
["asd","asd" ]

6.用hive函數完成一個wordcount

數據
asd，dsa，asd
asd，das

create table ruoze_wc(
sentence string
);

select word, count(1) as c
from
(
select explode(split(sentence,",")) as word from ruoze_wc
) t group by word
order by c desc;

split之後成了
[‘asd’,‘dsa’,‘asd’]
[‘asd’,‘das’]

explode後變成5行1列的形式

7.創建和數組相關的表

1,doudou,化學:物理:數學:語文
2,dasheng,化學:數學:生物:生理:衛生
3,rachel,化學:語文:英語:體育:生物

create table ruoze_student(
id int,
name string,
subjects array<string>  數組裏裝string
)row format delimited fields terminated by ','
COLLECTION ITEMS TERMINATED BY ':';   數組集合用：分割



load data local inpath '/home/hadoop/data/student.txt' into table ruoze_student;

hive (default)> select * from ruoze_student;
OK
1	doudou	["化學","物理","數學","語文"]
2	dasheng	["化學","數學","生物","生理","衛生"]
3	rachel	["化學","語文","英語","體育","生物"]

8.分區表

分區表：一個表按照某些字段進行分區
解決問題：全盤掃描慢，分區定位掃描快

create table order_partition(
orderNumber string,
event_time string
)PARTITIONED BY(event_month string)   按照event_month分區
row format delimited fields terminated by '\t';

指定分區加載，數據表會多個分區列

load data local inpath '/home/hadoop/data/order.txt' into table order_partition PARTITION (event_month='2014-05');

如果報錯，key太長，需要修改字符集，在mysql裏改

use ruoze_d5;
alter table PARTITIONS convert to character set latin1;
alter table PARTITION_KEYS convert to character set latin1;

手動hdfs dfs 創建partitions分區，會找不到元數據，需要
MSCK REPAIR 分區表，這要刷所有分區，性能低，不用。

增加分區的辦法：

alter table order_partition add partition(event_month='2014-07');

查看一個表的分區：

show partitions order_partition;

查看如何創建的表

show create table xxx；

9.多級分區表

create table order_mulit_partition(
orderNumber string,
event_time string
)PARTITIONED BY(event_month string, step string)
row format delimited fields terminated by '\t';

load data local inpath '/home/hadoop/data/order.txt' into table order_mulit_partition PARTITION (event_month='2014-05',step='1');

10.動態分區

需求，按照deptno字段寫進分區表裏

CREATE TABLE `ruoze_emp_partition`(
  `empno` int, 
  `ename` string, 
  `job` string, 
  `mgr` int, 
  `hiredate` string, 
  `sal` double, 
  `comm` double)
partitioned by(`deptno` int)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

靜態導入

insert into table ruoze_emp_partition PARTITION(deptno=10)
select empno,ename,job,mgr,hiredate,sal,comm from ruoze_emp where deptno=10;

假如有1000個deptno，豈不是要寫1000個導入

動態導入

分區字段deptno要寫在最後，1句解決。

insert overwrite table ruoze_emp_partition PARTITION(deptno)
select empno,ename,job,mgr,hiredate,sal,comm,deptno from ruoze_emp;

啓動動態分區功能

hive> set hive.exec.dynamic.partition=true;

Hive DML ，分區表

8.分區表

10.動態分區

hbase概念架構

cdh部署

flume連kafka

通過刪除mysql元數據來刪除hive表信息

編譯自定義函數到hive源碼

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結