問題:con_table(user_id,ttime) ttime爲用戶登陸時間,現在需要找出來連續登陸時間天數超過3天的用戶
create table con_table (
user_id int not null,
ttime datetime not null);
insert into con_table values (1,'2019-07-07 10:00:01');
insert into con_table values (1,'2019-07-07 11:00:01');
insert into con_table values (1,'2019-07-07 12:00:01');
insert into con_table values (1,'2019-07-08 10:00:01');
insert into con_table values (1,'2019-07-08 11:00:01');
insert into con_table values (1,'2019-07-09 10:00:01');
insert into con_table values (1,'2019-07-11 10:00:01');
insert into con_table values (1,'2019-07-12 10:00:01');
insert into con_table values (1,'2019-07-20 10:00:01');
insert into con_table values (1,'2019-07-21 10:00:01');
insert into con_table values (1,'2019-07-21 11:00:01');
insert into con_table values (1,'2019-07-22 10:00:01');
insert into con_table values (1,'2019-07-23 10:00:01');
insert into con_table values (2,'2019-07-07 10:00:01');
insert into con_table values (2,'2019-07-07 11:00:01');
insert into con_table values (2,'2019-07-08 12:00:01');
insert into con_table values (2,'2019-07-09 10:00:01');
insert into con_table values (2,'2019-07-10 11:00:01');
insert into con_table values (2,'2019-07-12 10:00:01');
insert into con_table values (2,'2019-07-14 10:00:01');
insert into con_table values (3,'2019-07-10 10:00:01');
insert into con_table values (3,'2019-07-11 11:00:01');
insert into con_table values (3,'2019-07-11 12:00:01');
insert into con_table values (3,'2019-07-11 13:00:01');
insert into con_table values (3,'2019-07-11 14:00:01');
insert into con_table values (3,'2019-07-20 10:00:01');
第一步:我們的時間是精確到秒的,也就是我們用戶可能一天登陸多次,所以第一步要對userid和ttime去重複
select
distinct
user_id,
date_format(ttime,'%y-%m-%d') as days
from con_table
hive中可以用to_date?或者yy-mm-dd?【待定確認】
第二步:基於上面的表,對每個用戶,每天排序
select
user_id,
days,
(select count(days)
from
(
select distinct user_id,date_format(ttime,'%y-%m-%d') as days
from con_table
) t2
where t2.user_id = t1.user_id and t2.days > t1.days
) + 1 as rnk
from
(
select
distinct
user_id,
date_format(ttime,'%y-%m-%d') as days
from con_table
) as t1;
第三步:添加上我們所有需要的信息
第四步:對userid,index進行groupby
最後:再having count(*) >= 3就好啦
參考鏈接https://zhuanlan.zhihu.com/p/49285570
我這個例子比參考連接複雜了一點,因爲把時間具體化了,
反正核心思路,就是對date倒序排列rnk【連續的】,然後max(date)-rnk【有意義的連續的】,date-max(date)-rnk【如果是同一個值就是連續的,不是同一個值就不是連續的】