数据筹备
idlogin_date
012021-02-28
012021-03-01
012021-03-02
012021-03-04
012021-03-05
012021-03-06
012021-03-08
022021-03-01
022021-03-02
022021-03-03
022021-03-06
032021-03-06

计划一
1.先把数据依照用户id分组,依据登录日期排序
SQL:
SELECT
id,
login_date,
row_number() over(partition by id order by login_date asc) as rn
FROM data;

后果:
idlogin_datern
012021-02-281
012021-03-012
012021-03-023
012021-03-044
012021-03-055
012021-03-066
012021-03-087
022021-03-011
022021-03-022
022021-03-033
022021-03-064
032021-03-061

2.用登录日期与rn求date_sub,失去的差值日期如果是相等大数据培训的,则阐明这两天必定是间断的
SQL:
SELECT

 t1.id, t1.login_date, date_sub(t1.login_date, rn) as diff_date

FROM

(   SELECT       id,       login_date,       row_number() over(partition by id order by login_date asc) as rn    FROM data) t1;
后果:
idlogin_datediff_date
012021-02-282021-02-27
012021-03-012021-02-27
012021-03-022021-02-27
012021-03-042021-02-28
012021-03-052021-02-28
012021-03-062021-02-28
012021-03-082021-03-01
022021-03-012021-02-28
022021-03-022021-02-28
022021-03-032021-02-28
022021-03-062021-03-02
032021-03-062021-03-05

3.依据id和日期差date_diff分组,登录次数即为分组后的count(1)
SQL:
SELECT
t2.id,
count(1) as login_times,
min(t2.login_date) as start_date,
max(t2.login_date) as end_date
FROM
(

SELECT t1.id, t1.login_date, date_sub(t1.login_date,rn) as diff_dateFROM(    SELECT     id,     login_date,     row_number() over(partition by id order by login_date asc) as rn     FROM data) t1

) t2
group by t2.id, t2.diff_date
having login_times >= 3;

后果:
idlogin_timesstart_dateend_date
0132021-02-282021-03-02
0132021-03-042021-03-06
0232021-03-012021-03-03

计划二
计划二利用lag和lead函数进行解决,思路相似。
SQL:
SELECT
id,
lag_login_date,
login_date,lead_login_date
FROM

  (SELECT      id,     login_date,     lag(login_date,1,login_date) over(partition by id order by login_date) as lag_login_date,     lead(login_date,1,login_date) over(partition by id order by login_date) as lead_login_date  FROM data  ) t1
where datediff(login_date,lag_login_date) =1 and datediff(lead_login_date,login_date) =1;后果:
idlag_login_datelogin_datelead_login_date
012018-02-282018-03-012018-03-02
012018-03-042018-03-052018-03-06
022018-03-012018-03-022018-03-03

行转列和列转行
以"间断登录"中的数据为例:
select id,

   concat_ws(',',collect_list(login_date)) cw

from data

group by id;后果:
idcw
012018-02-28,2018-03-01,2018-03-02,2018-03-04,2018-03-05,2018-03-06,2018-03-08
022018-03-01,2018-03-02,2018-03-03,2018-03-06
032018-03-06

以下面SQL生成的数据为基准,执行下列SQL:
select id, login_date
from t

lateral view explode(split(cw,',')) b AS login_date;后果:
idlogin_date
012018-02-28
012018-03-01
012018-03-02
012018-03-04
012018-03-05
012018-03-06
012018-03-08
022018-03-01
022018-03-02
022018-03-03
022018-03-06
032018-03-06