Hive 面試題總結

目錄

排序

分組類

JOIN類

窗口函數類

參考文章


排序類

1、有1億個用戶,存儲在表users中,包含用戶uid、用戶年紀age、用戶消費總金額total,其中以uid唯一標識1個用戶,按照用戶年齡從大到小排序,如果年齡相同則以消費總金額從小到大排序。

這是1個全排序問題,首先預估總內存消耗大小,1億[用戶數]*(8B[uid]+4B[age]+8B[total])約等於2G,在現有計算條件可以滿足全部放入內存的需求,因此可以不必過多考慮優化問題。

-- 全局排序
SELECT *
FROM Users
ORDER BY age DESC,total ASC;

-- 局部(分桶)排序
SELECT *
FROM Users
DISTRIBUTE BY age
SORT BY age DESC,total ASC

分組類

1、有10萬個店鋪,每個顧客訪問任意一個店鋪時都會生成1條訪問日誌,表名未Visist,其中用戶id字段未uid,訪問的店鋪字段未store,試統計每個店鋪的uv。

SELECT store,COUNT(DISTINCT uid) uv
FROM Visit
GROUP BY store;

2、有1張表示人生階段的表Lifestage,包含2個字段:用戶唯一標識uid、人生階段組合字段stage,其中stage由","分隔的字符串組成,如“計劃買車,已經買房”,試統計每一個細分人生階段的用戶人數。

-- 列轉行
SELECT stage_detail,COUNT(DISTINCT uid)
FROM Lifestage
LATERAL VIEW EXPLODE(SPLIT(stage,',')) Lifestage_tmp AS stage_detail
GROUP BY stage_detail

3、有1張表示人生階段的表Lifestage,包含2個字段:用戶唯一標識uid、人生階段字段stage,每行存儲一個用戶的人生階段數據,如一個用戶43有2條記錄:43,計劃買車; 43,已經買房,試將同一個用戶的所有人生階段字段整合成一個用“,”分隔的組合字段,如“計劃買車,已經買房”。

-- 行轉列
SELECT uid,
    CONCAT_WS(',', COLLECT_LIST(stage)) -- 如果一個用戶stage 會有重複的話,則使用COLLECT_SET(stage)
FROM Lifestage
GROUP BY uid

4、1張學生成績表course_t,包含學生sid、課程號course、成績score幾個字段,試得到語文成績大於數學成績的學生成績數據。如

sid course score
1 yuwen 43
1 shuxue 55
2 yuwen 77
2 shuxue 88
3 yuwen 98
3 shuxue 65
SELECT 
	*
FROM
(
SELECT sid,
	MAX(CASE WHEN course='yuwen' THEN score
    	ELSE NULL 
    END) AS yuwen_score,
    MAX(CASE WHEN course='shuxue' THEN score
    	ELSE NULL 
    END) AS shuxue_score
FROM mart_fsp_security_safetmp.course_t
GROUP BY sid
) course_tmp_t
WHERE yuwen_score>shuxue_score
; 

-- 構造的數據
CREATE TABLE mart_fsp_security_safetmp.course_t AS
SELECT 1 AS id,1 AS  sid,'yuwen' AS course,43 AS score
UNION ALL SELECT 2 AS id,1 AS  sid,'shuxue' AS course,55 AS score
UNION ALL SELECT 3 AS id,2 AS  sid,'yuwen' AS course,77 AS score
UNION ALL SELECT 4 AS id,2 AS  sid,'shuxue' AS course,88 AS score
UNION ALL SELECT 5 AS id,3 AS  sid,'yuwen' AS course,98 AS score
UNION ALL SELECT 6 AS id,3 AS  sid,'shuxue' AS course,65 AS score

JOIN類

1、將下面的Address表,轉成如後面所示的表

id name parent_id
1 北京市 0
2 山東省 0
3 昌平區 1
4 海淀區 1
5 沙閘鎮 3
6 馬池口鎮 3
7 中關村 4
8 上地 4
9 煙臺市 2
10 青島市 2
11 五通橋區 9
12 馬邊區 9
13 定文鎮 10
14 羅成鎮 10

-- 即層次對象進行摺疊
SELECT
	first_second_t.first_name first_name,
    first_second_t.second_name second_name,
    C.name third_name    
FROM
(
-- 取第1即
SELECT A.id AS first_id,
	A.name AS first_name,
    A.parent_id AS first_parent_id,
    B.id AS second_id,
	B.name AS second_name,
    B.parent_id AS second_parent_id
FROM Address A
JOIN Address B
ON A.id=B.parent_id
WHERE A.parent_id=0
) first_second_t
JOIN Address C
ON first_second_t.second_id=C.parent_id
;

-- 創建臨時表
CREATE TABLE Address AS
SELECT 1 AS id,'北京市' AS name,0 AS parent_id
UNION ALL SELECT 2,'山東省',0
UNION ALL SELECT 3,'昌平區',1
UNION ALL SELECT 4,'海淀區',1
UNION ALL SELECT 5,'沙閘鎮',3
UNION ALL SELECT 6,'馬池口鎮',3
UNION ALL SELECT 7,'中關村',4
UNION ALL SELECT 8,'上地',4
UNION ALL SELECT 9,'煙臺市',2
UNION ALL SELECT 10,'青島市',2
UNION ALL SELECT 11,'五通橋區',9
UNION ALL SELECT 12,'馬邊區',9
UNION ALL SELECT 13,'定文鎮',10
UNION ALL SELECT 14,'羅成鎮',10

窗口函數類

1、用戶訪問表vist_t,包含唯一標識用戶uid、訪問月份month、訪問次數vist_cnt字段,試計算每個用戶截止到每月爲止的最大單月訪問次數和累計到該月的總訪問次數。

如數據表:

uid month vist_cnt
A 2015-01 5
A 2015-01 15
B 2015-01 5
A 2015-01 8
B 2015-01 25
A 2015-01 5
A 2015-02 4
A 2015-02 6
B 2015-02 10
B 2015-02 5
A 2015-03 16
A 2015-03 22
B 2015-03 23
B 2015-03 10
B 2015-03 1

得到

SELECT
	uid,
    month,
    -- MAX(vist_cnt_m) OVER (PARTITION BY uid ORDER BY month) vist_cnt_max, -- 分組內,截止到當前行
    MAX(vist_cnt_m) OVER (PARTITION BY uid) vist_cnt_max, -- 分組內所有行
    SUM(vist_cnt_m) OVER (PARTITION BY uid ORDER BY month) vist_cnt_sum, -- 分組內,截止到當前行
    vist_cnt_m
FROM
(
SELECT uid,
	month,
    SUM(vist_cnt) vist_cnt_m
FROM mart_fsp_security_safetmp.vist_t
GROUP BY 1,2
) m_t

-- 表數據
CREATE TABLE mart_fsp_security_safetmp.vist_t AS
SELECT 'A' AS uid,'2015-01' AS month,5 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-01' AS month,15 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-01' AS month,5 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-01' AS month,8 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-01' AS month,25 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-01' AS month,5 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-02' AS month,4 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-02' AS month,6 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-02' AS month,10 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-02' AS month,5 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-03' AS month,16 AS vist_cnt
UNION ALL SELECT  'A' AS uid,'2015-03' AS month,22 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-03' AS month,23 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-03' AS month,10 AS vist_cnt
UNION ALL SELECT  'B' AS uid,'2015-03' AS month,1 AS vist_cnt

2、銷售表包含商戶(merchant)、月份(month)、銷售額(money)3個字段,需要求每個店鋪的當月銷售額和總銷售額。

merchant month money
a 1 150
a 1 200
b 1 1000
b 1 800
c 1 250
c 1 220
b 1 6000
a 2 2000
a 2 3000
b 2 1000
b 2 1500
c 2 350
c 2 280
a 3 350
a 3 250
-- method1: rollup 參考http://lxw1234.com/archives/2015/04/190.htm
SELECT merchant,
	month,
    CASE WHEN group_id=1 THEN total 
    ELSE NULL END AS total_sale_money,
    CASE WHEN group_id=3 THEN total 
    ELSE NULL END AS month_sale_money
FROM
(
SELECT merchant,
	month,
    GROUPING__ID AS group_id,
    SUM(money) total
FROM mart_fsp_security_safetmp.sale_t
GROUP BY merchant,
	month
WITH ROLLUP
) sale_tmp_t
-- WHERE group_id IN (1,3)

-- method2 更合適
SELECT DISTINCT merchant,
	month,
    SUM(money) OVER (PARTITION BY merchant,month) month_sale_money,
    SUM(money) OVER (PARTITION BY merchant) total_sale_money
FROM mart_fsp_security_safetmp.sale_t


-- 測試數據
CREATE TABLE mart_fsp_security_safetmp.sale_t AS
SELECT  'a' AS  merchant,'1' AS month,150 AS money
UNION ALL SELECT  'a' AS  merchant,'1' AS month,200 AS money
UNION ALL SELECT  'b' AS  merchant,'1' AS month,1000 AS money
UNION ALL SELECT  'b' AS  merchant,'1' AS month,800 AS money
UNION ALL SELECT  'c' AS  merchant,'1' AS month,250 AS money
UNION ALL SELECT  'c' AS  merchant,'1' AS month,220 AS money
UNION ALL SELECT  'b' AS  merchant,'1' AS month,6000 AS money
UNION ALL SELECT  'a' AS  merchant,'2' AS month,2000 AS money
UNION ALL SELECT  'a' AS  merchant,'2' AS month,3000 AS money
UNION ALL SELECT  'b' AS  merchant,'2' AS month,1000 AS money
UNION ALL SELECT  'b' AS  merchant,'2' AS month,1500 AS money
UNION ALL SELECT  'c' AS  merchant,'2' AS month,350 AS money
UNION ALL SELECT  'c' AS  merchant,'2' AS month,280 AS money
UNION ALL SELECT  'a' AS  merchant,'3' AS month,350 AS money
UNION ALL SELECT  'a' AS  merchant,'3' AS month,250 AS money

3、銷售明細表sale_t包含商戶(merchant)、時間(time)、客戶(customer)、產品(product)、狀態(status,0爲失敗,1爲成功)、銷售額(money)3個字段,輸出所有商戶最後成功的記錄。

merchant time customer product status money 是否輸出(y:yes,n:no)
a 2019-11-25 c1 p1 0 10 n
a 2019-11-26 c2 p2 1 10 y
b 2019-11-25 c1 p3 1 10 n
b 2019-11-27 c3 p4 0 10 n
c 2019-11-27 c4 p5 1 10 n
c 2019-11-28 c1 p6 0 10 n
b 2019-11-29 c2 p7 1 10 y
a 2019-11-29 c4 p8 1 10 y
SELECT *
FROM
(
SELECT *,
    AVG(status) OVER(PARTITION BY merchant ORDER BY time DESC) status_avg
FROM mart_fsp_security_safetmp.sale_t2
) tmp_t
WHERE status_avg=1

-- 測試數據
CREATE TABLE mart_fsp_security_safetmp.sale_t2 AS
SELECT 'a' AS merchant,'2019-11-25' AS time,'c1' AS customer,'p1' AS product,0 AS status,10 AS money
UNION ALL SELECT 'a' AS merchant,'2019-11-26' AS time,'c2' AS customer,'p2' AS product,1 AS status,10 AS money
UNION ALL SELECT 'b' AS merchant,'2019-11-25' AS time,'c1' AS customer,'p3' AS product,1 AS status,10 AS money
UNION ALL SELECT 'b' AS merchant,'2019-11-27' AS time,'c3' AS customer,'p4' AS product,0 AS status,10 AS money
UNION ALL SELECT 'c' AS merchant,'2019-11-27' AS time,'c4' AS customer,'p5' AS product,0 AS status,10 AS money
UNION ALL SELECT 'c' AS merchant,'2019-11-28' AS time,'c1' AS customer,'p6' AS product,0 AS status,10 AS money
UNION ALL SELECT 'b' AS merchant,'2019-11-29' AS time,'c2' AS customer,'p7' AS product,1 AS status,10 AS money
UNION ALL SELECT 'a' AS merchant,'2019-11-29' AS time,'c4' AS customer,'p8' AS product,1 AS status,10 AS money

參考文章

  1. https://blog.csdn.net/qq_41568597/article/details/84309503
  2. https://www.cnblogs.com/qingyunzong/p/8747656.html#_label0_2
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章