hive練習:窗口函數相關

答案是我自己寫的,有不同看法的歡迎討論

1.編寫sql實現每個用戶截止到每月爲止的最大單月訪問次數和累計到該月的總訪問次數

數據:

A,2015-01,5 
A,2015-01,15 
B,2015-01,5 
A,2015-01,8 
B,2015-01,25 
A,2015-01,5 
A,2015-02,4 
A,2015-02,6 
B,2015-02,10 
B,2015-02,5 
A,2015-03,16 
A,2015-03,22 
B,2015-03,23 
B,2015-03,10 
B,2015-03,1

建表:

create table if not exists visits(
userid string,
month string,
visits int
)
row format delimited fields terminated by ','
;

load data local inpath '/root/hivedata/visits.txt' overwrite into table visits;

執行

select
userid,
month,
max(visits) over(distribute by userid sort by month),
sum(visits) over(distribute by userid sort by month),
visits
from
(select
userid,
month,
sum(visits) visits
from visits
group by userid,month) as t
;

2.求出每個欄目的被觀看次數及累計觀看時長

數據

vedio表
Uid channel min
1 1 23
2 1 12
3 1 12
4 1 32
5 1 342
6 2 13
7 2 34
8 2 13
9 2 134

建表

create table if not exists video(
uid int,
channel string,
min int
)
row format delimited fields terminated by ' ' ;

load data local inpath '/root/hivedata/video.txt' into table video;

執行

select 
channel,
count(channel),
sum(min)
from video
group by channel
;

3.編寫連續7天登錄的總人數

數據:

t1表

Uid dt login_status(1登錄成功,0異常)

1 2019-07-11 1
1 2019-07-12 1
1 2019-07-13 1
1 2019-07-14 1
1 2019-07-15 1
1 2019-07-16 1
1 2019-07-17 1
1 2019-07-18 1
2 2019-07-11 1
2 2019-07-12 1
2 2019-07-13 0
2 2019-07-14 1
2 2019-07-15 1
2 2019-07-16 0
2 2019-07-17 1
2 2019-07-18 0
3 2019-07-11 1
3 2019-07-12 1
3 2019-07-13 1
3 2019-07-14 1
3 2019-07-15 1
3 2019-07-16 1
3 2019-07-17 1
3 2019-07-18 1

建表

create table if not exists login2(
uid int,
dt string,
login_status int
)
row format delimited fields terminated by ' ' ;

load data local inpath '/root/hivedata/login.txt' into table login2;

執行:

select
count(t3.uid)
from
(select
uid
from
(select
t1.uid,
date_sub(t1.dt,t1.rm) as dt
from
(select
uid,
dt,
row_number() over(distribute by uid sort by dt) as rm
from login2
where login_status=1) t1) t2
group by t2.uid,t2.dt
having count(t2.uid) > 7) t3
;

4.編寫sql語句實現每班前三名,分數一樣並列,同時求出前三名按名次排序的一次的分差:

數據

stu表

Stu_no class score

1 1901 90
2 1901 90
3 1901 83
4 1901 60
5 1902 66
6 1902 23
7 1902 99
8 1902 67
9 1902 87

建表

create table if not exists stu(
stu_no int,
class string,
score int
)
row format delimited fields terminated by ' '
;

load data local inpath '/root/hivedata/stu.txt' into table stu;

執行

select
t1.class,
t1.stu_no,
t1.score,
t1.rn,
t1.score - lag(t1.score) over(distribute by class sort by score desc) diff
from
(select
class,
rank() over(distribute by class sort by score desc) as rn,
stu_no,
score
from 
stu
) t1
where t1.rn <= 3
;

5.每個店鋪的當月銷售額和累計到當月的總銷售額

數據:

店鋪,月份,金額

a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250

建表:

create table if not exists store(
sname string,
month string,
money int
)
row format delimited fields terminated by ','
;

load data local inpath '/root/hivedata/store.txt' into table store;

執行:

select
t1.sname,
t1.month,
t1.money,
sum(t1.money) over(distribute by t1.sname sort by month)
from
(select
sname,
month,
sum(money) money
from store
group by sname,month
) t1
;

6.分析用戶行爲習慣,找到每一個用戶在表中的第一次行爲

數據:

uid,time,action

1,time1,read
3,time2,comment
1,time3,share
2,time4,like
1,time5,write
2,time6,like
3,time7,write
2,time8,read

建表:

create table if not exists user_action_log(
uid string,
time string,
action string
)
row format delimited fields terminated by ','
;

load data local inpath '/root/hivedata/user_action_log.txt' into table user_action_log;

執行:

select
t1.uid,
t1.action
from
user_action_log t1
join
(select
uid,
min(time) time
from user_action_log
group by uid
) t2
on t1.uid=t2.uid
and t1.time=t2.time
;

7.訂單及訂單類型行列互換

數據:

order_id,order_type,order_time

111	N	10:00
111	A	10:05
111	B	10:10

建表:

create table if not exists myorder(
order_id string,
order_type string,
order_time string
)
row format delimited fields terminated by '\t'
;

load data local inpath '/root/hivedata/order.txt' into table myorder;

執行:

select
* 
from
(select
order_id,
order_type order_type1,
lead(order_type) over(distribute by order_id sort by order_time) order_type2,
order_time order_time1,
lead(order_time) over(distribute by order_id sort by order_time) order_time2
from myorder) t1
where t1.order_type2 is not null
;

8.某APP每天訪問數據存放在表access_log裏面,

包含日期字段ds,用戶類型字段user_type,用戶賬號user_id,用戶訪問時間 log_time,請使用hive的hql語句實現如下需求:

PV(訪問量):用戶每次刷新即被計算一次。
UV(獨立訪客):00:00-24:00內相同的客戶端只被計算一次。
(1)、每天整體的訪問UV、PV? 

(2)、每天每個類型的訪問UV、PV? 

(3)、每天每個類型中最早訪問時間和最晚訪問時間? 

(4)、每天每個類型中訪問次數最高的10個用戶? 

數據:

2019-09-01	a	u001	01:12
2019-09-01	a	u001	01:13
2019-09-01	a	u001	01:46
2019-09-01	b	u002	04:56
2019-09-01	b	u003	13:12
2019-09-02	a	u001	12:12
2019-09-02	c	u004	11:34
2019-09-02	a	u005	14:12
2019-09-02	c	u006	16:18
2019-09-02	a	u007	05:10
2019-09-02	c	u008	07:12
2019-09-02	a	u009	09:06
2019-09-02	b	u023	10:12
2019-09-02	a	u045	18:47
2019-09-03	a	u023	12:15
2019-09-04	b	u054	06:12
2019-09-04	c	u057	09:35
2019-09-04	c	u056	00:57
2019-09-05	a	u068	15:12
2019-09-06	b	u053	11:25
2019-09-08	a	u001	09:34

建表:

create table if not exists access_log(
ds string,
user_type string,
user_id string,
log_time string
)
row format delimited fields terminated by '\t'
;

load data local inpath '/root/hivedata/access_log.txt' into table access_log;

(1)、每天整體的訪問UV、PV?

select
ds,
count(distinct user_id) uv,
count(*) pv
from access_log
group by ds
;

(2)、每天每個類型的訪問UV、PV?

select
ds,
user_type,
count(distinct user_id) uv,
count(*) pv
from access_log
group by ds,user_type
;

(3)、每天每個類型中最早訪問時間和最晚訪問時間?

select
ds,
user_type,
min(log_time),
max(log_time)
from access_log
group by ds,user_type
;

(4)、每天每個類型中訪問次數最高的10個用戶?

select
ds,
user_type,
user_id,
cnt,
rn
from
(select
ds,
user_type,
user_id,
cnt,
rank() over(distribute by ds,user_type sort by cnt desc) rn
from
(select
ds,
user_type,
user_id,
count(*) cnt
from access_log
group by ds,user_type,user_id) t1) t2
where rn < 10
;

9.每個用戶連續登陸的最大天數

數據:

login表

uid,date

1,2019-08-01
1,2019-08-02
1,2019-08-03
2,2019-08-01
2,2019-08-02
3,2019-08-01
3,2019-08-03
4,2019-07-28
4,2019-07-29
4,2019-08-01
4,2019-08-02
4,2019-08-03

建表:

create table if not exists login(
uid int,
udate string
)
row format delimited fields terminated by ','
;

load data local inpath '/root/hivedata/login2.txt' into table login;

執行:

select
uid,
max(cn)
from
(select
uid,
count(*) cn
from
(select
uid,
date_sub(udate,row_number() over(distribute by uid sort by udate)) udate
from login) t1
group by uid,udate) t2
group by uid
;

10.使用hive的hql實現男女各自第一名及其它

id sex chinese_s math_s
0 0 70 50
1 0 90 70
2 1 80 90
1、男女各自語文第一名(0:男,1:女)
2、男生成績語文大於80,女生數學成績大於70

建表:

create table if not exists score_s(
uid int,
usex int,
chinese_s int,
math_s int
)
row format delimited fields terminated by ' '
;

load data local inpath '/root/hivedata/score_s.txt' into table score_s;

1、男女各自語文第一名(0:男,1:女)

執行:

select
uid,
usex,
chinese_s
from
(select
uid,
usex,
chinese_s,
rank() over(distribute by usex sort by chinese_s desc) rn
from score_s) t1
where t1.rn=1
;

2、男生成績語文大於80,女生數學成績大於70

執行

select
*
from
score_s
where usex=0 and chinese_s>80
or usex=1 and math_s>70
;

11.使用hive的hql實現最大連續訪問天數

求出每個用戶當月最大連續登錄天數

數據

log_time uid

2018-10-01 18:00:00,123
2018-10-02 18:00:00,123
2018-10-02 19:00:00,456
2018-10-04 18:00:00,123
2018-10-04 18:00:00,456
2018-10-05 18:00:00,123
2018-10-06 18:00:00,123

建表

create table if not exists login3(
log_time string,
uid string
)
row format delimited fields terminated by ','
;

load data local inpath '/root/hivedata/login3.txt' into table login3;

執行:

select
uid,
max(cnt)
from
(select
uid,
count(*) cnt
from
(select
uid,
mon,
dt-dense_rank() over(distribute by uid,mon sort by dt) dt
from
(select
uid,
month(log_time) mon,
day(log_time) dt
from
login3) t1
) t2
group by uid,mon,t2.dt) t3
group by t3.uid
;
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章