开窗函数分为两类:聚合开窗函数和排序开窗函数,有时候一组数组只返回一组值是不能满足咱们的数据分析需要的,咱们须要晓得某个学科的前几名详细信息、最近一次下单的用户订单详情、下过n次单的用户等等,这个时候开窗函数就能够派上用场了。如果没有开窗函数的话,则须要通过简单的关联查问能力获取对应的详细信息。

聚合函数和聚合开窗函数的区别如下:

  • SQL 规范容许将所有聚合函数用作开窗函数,用OVER 关键字辨别开窗函数和聚合函数,例如sum()over(partition by field )
  • 聚合函数每组只返回一个值,开窗函数每组可返回多个值,这里的值指的是每个分组的行信息

    聚合开窗函数理论利用举例

  • 下单累计金额达到3000元的用户历史订单详情,并且依照用户的订单工夫降序,以便查看用户的购买状况

    select * from (select          sale_order_id,         order_code,         customer_id,         order_create_time,            actual_sales_amount,         sum(actual_sales_amount)over(partition by customer_id) as grand_total         from          cdp_order_model          where channel='cdp_order_sale_01'         ) a where a.grand_total > 3000 order by a.customer_id,a.order_create_time desc
  • 单笔最高金额达到某个规范的用户历史订单详情

    select * from (select          sale_order_id,         order_code,         customer_id,         order_create_time,         actual_sales_amount,         max(actual_sales_amount)over(partition by customer_id) as max_order_amount         from          cdp_order_model          where channel='cdp_order_sale_01'         ) a where a.max_order_amount > 10000 order by a.customer_id,a.order_create_time desc

    最常见的排序开窗函数有三种:

  • row_number() over(partition by filed2 order by filed2),对雷同的值不进行辨别,排序从1到n名次间断
  • rank() over(partition by filed2 order by filed2),相等的值排名雷同,但若有相等的值,则序号从1到n不间断,如果有两个人都排在第3名,则没有第4名
  • dense_rank() over(partition by filed2 order by filed2),对相等的值排名雷同,但序号从1到n间断,如果有两个人都排在第一名,则排在第2名(假如仅有1个第二名)的人是第3集体

    常见的排序开窗函数利用场景

  • 查问用户最近一次订单的详情

    select  distinct superid,houses_condition as tag_value from(     select      b.superid,     a.houses_condition,     row_number() over(partition by b.superid order by a.order_create_time desc) as num     from     cdp_order_model a     join superid_all b on a.customer_id = b.id and b.idtype ='jiami_mobile'     where a.channel='cdp_order_sale_01'     and trim(a.order_channel)='EOS')b where b.num = 1 limit 100000000
  • 查问小程序中用户间断登陆的最长天数,表名:login_data,表字段:user_id,login_date

    可用登陆日期减去序号(user_id+login_date去重,依据user_id分组,login_date升序排序),若失去的日期雷同,则为间断登陆,依据计算后的日期计数即可失去最长间断登陆天数
    select  user_id,max(cnt)from (select user_id,after_date,count(0) as cntfrom (select user_id,login_date,date_depart('day',login_date,-1*row_number(partition by user_id order by login_date asc)) as after_datefrom(select user_id,login_datefrom login_datagroup by user_id,login_date) a) bgroup by user_id,after_date)cgroup by user_id
  • 查问用户第二次复购的订单详情,一天有多笔订单算作一单,这个时候就要用到间断排序的dense_rank()排序函数了,而rank()因为无奈预计同一天订单的数量而无奈预计对应的排名进而进行筛选

    select  distinct superid,houses_condition as tag_value from(     select      b.superid,     a.houses_condition,     dense_rank() over(partition by b.superid order by substring(cast(a.order_create_time as varchar),1,10) desc) as num     from     cdp_order_model a     join superid_all b on a.customer_id = b.id and b.idtype ='jiami_mobile'     where a.channel='cdp_order_sale_01'     and trim(a.order_channel)='EOS')b where b.num = 2 limit 100000000