Hive高級聚合之GROUPING SETS/ROLLUP/CUBE/Grouping_ID

1、GROUPING SETS

該關鍵字可以實現同一數據集的多重group by操作。事實上GROUPING SETS是多個GROUP BY進行UNION ALL操作的簡單表達,它僅僅使用一個stage完成這些操作。GROUPING SETS的子句中如果包含()數據集,則表示整體聚合。
示例:
select name, work_space[0] as main_place, count(employee_id) as emp_id_cnt
from employee
group by name, work_space[0]
GROUPING SETS((name,work_space[0]), name, ());
 
// 上面語句與下面語句等效
 
select name, work_space[0] as main_place, count(employee_id) as emp_id_cnt
from employee
group by name, work_space[0]
UNION ALL
select name, work_space[0] as main_place, count(employee_id) as emp_id_cnt
from employee
group by name
UNION ALL
select name, work_space[0] as main_place, count(employee_id) as emp_id_cnt
from employee;

2、ROLLUP

擴展了GROUTING SETS。

示例:

select a, b, c from table group by a, b, c WITH ROLLUP;
// 等價於下面語句
select a, b, c from table group by a, b, c
GROUPING SETS((a,b,c),(a,b),(a),());

3、CUBE

擴展了GROUTING SETS,對各種條件進行聚合。

示例:

select a, b, c from table group by a, b, c WITH ROLLUP;
// 等價於下面語句
select a, b, c from table group by a, b, c
GROUPING SETS((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),());

4、聚合條件 HAVING

having用於在組內進行過濾。

select cid,max(price) mx from orders group by cid having mx  > 1000;
//等價於下面的子查詢語句
select t.cid, t.mx from (
        select cid, max(price) mx from orders group by cid
    ) t
where t.mx > 1000;

5、Grouping_ID

詳解:https://blog.csdn.net/wen_2/article/details/65446971



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章