連續時間問題-sql

問題:con_table(user_id,ttime)  ttime爲用戶登陸時間,現在需要找出來連續登陸時間天數超過3天的用戶

create table con_table (
user_id int not null,
ttime datetime not null);
insert into con_table values (1,'2019-07-07 10:00:01');
insert into con_table values (1,'2019-07-07 11:00:01');
insert into con_table values (1,'2019-07-07 12:00:01');
insert into con_table values (1,'2019-07-08 10:00:01');
insert into con_table values (1,'2019-07-08 11:00:01');
insert into con_table values (1,'2019-07-09 10:00:01');
insert into con_table values (1,'2019-07-11 10:00:01');
insert into con_table values (1,'2019-07-12 10:00:01');
insert into con_table values (1,'2019-07-20 10:00:01');
insert into con_table values (1,'2019-07-21 10:00:01');
insert into con_table values (1,'2019-07-21 11:00:01');
insert into con_table values (1,'2019-07-22 10:00:01');
insert into con_table values (1,'2019-07-23 10:00:01');
insert into con_table values (2,'2019-07-07 10:00:01');
insert into con_table values (2,'2019-07-07 11:00:01');
insert into con_table values (2,'2019-07-08 12:00:01');
insert into con_table values (2,'2019-07-09 10:00:01');
insert into con_table values (2,'2019-07-10 11:00:01');
insert into con_table values (2,'2019-07-12 10:00:01');
insert into con_table values (2,'2019-07-14 10:00:01');
insert into con_table values (3,'2019-07-10 10:00:01');
insert into con_table values (3,'2019-07-11 11:00:01');
insert into con_table values (3,'2019-07-11 12:00:01');
insert into con_table values (3,'2019-07-11 13:00:01');
insert into con_table values (3,'2019-07-11 14:00:01');
insert into con_table values (3,'2019-07-20 10:00:01');

第一步:我們的時間是精確到秒的,也就是我們用戶可能一天登陸多次,所以第一步要對userid和ttime去重複

select 
	distinct
	user_id,
    date_format(ttime,'%y-%m-%d') as days
from con_table

hive中可以用to_date?或者yy-mm-dd?【待定確認】

第二步:基於上面的表,對每個用戶,每天排序

select 
    user_id,
    days,
    (select count(days) 
     from 
    	(
    	select distinct user_id,date_format(ttime,'%y-%m-%d') as days 
    	from con_table
    	) t2 
	where t2.user_id = t1.user_id and t2.days > t1.days
    ) + 1 as rnk
from
(
select 
	distinct
	user_id,
    date_format(ttime,'%y-%m-%d') as days
from con_table
) as t1;

第三步:添加上我們所有需要的信息

第四步:對userid,index進行groupby

最後:再having count(*) >= 3就好啦

參考鏈接https://zhuanlan.zhihu.com/p/49285570

我這個例子比參考連接複雜了一點,因爲把時間具體化了,

反正核心思路,就是對date倒序排列rnk【連續的】,然後max(date)-rnk【有意義的連續的】,date-max(date)-rnk【如果是同一個值就是連續的,不是同一個值就不是連續的】

preview

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章