hive 求最大最小值均值及對應的一個key鍵(對應行)

hive 求最大最小均值就不多說了。
此次在業務上碰到一個問題，是要求最大值、最小值、平均值的同時，還要求最大值、最小值對應行的key字段（以找到對應時間）。並且key中的時間戳還是0時區，在查詢時要注意對時區進行轉換。
表結構大致如下：
key double
id+時間對應的double數值
方法一：join (最常見)
比如取最小值和最小值所在行對應的key：

select *
from 
(
    select min(d) as min
     from
        (
           select key,d
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select key,d
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
        )
)t1
left join
(
    select key,d 
    from
    (
      select key,d
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select key,d
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
     )   
)t2 
    on t1.min=t2.d

這樣要達到目的，min +1 join， max +1 join, + avg

方法二：
select * from table where double in (select min(double) from table)

    select * 
    from
    (
         select *
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select * 
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
     ) where d in ( select min(d) as min
     from
        (
          select *
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select * 
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
        ))

方法三： row_number()

下面的語句取最大最小值：

select tt1.d as min, tt1.key as minkey, tt2.d as max, tt2.key as maxkey from
(
select * 
  from
     (
        select row_number() over(partition by s order by d) as ord, key, d
          from
             (
                select *
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select * 
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
              )
     )t1 where t1.ord=1
)tt1,
(
select * 
  from
    (
        select row_number() over(partition by s order by d desc) as ord, key, d
        from
            (
                select *
   from default.test_202006
  where key>=50104160000000 and key<50104240000000
  union all
 select * 
   from default.test_202006
  where key>=50105000000000 and key<50105160000000
            )
    )t2 where t2.ord=1
)tt2

方法四：在某同學建議下搞的騷操作

row_number兩次，一次正序一次倒序。取出來再用collect處理。這個效率在測試中是最快的。最後業務上也是採用了這個方式，比其他同學join的處理快出20+s。

select t3.d[0] as min, t3.key[0] as minkey, t3.d[1] as max, t3.key[1] as maxkey, t4.avg
from
(
select collect_set(key) as key, collect_set(d) as d
  from
    (
        select '999' as num,
               row_number() over(partition by s order by d ) as ord1,
               row_number() over(partition by i order by d desc) as ord2, 
               key, d
        from
            (
                 select *
                   from default.test_202006
                  where key>=50104160000000 and key<50104240000000
                  union all
                 select * 
                   from default.test_202006
                  where key>=50105000000000 and key<50105160000000
            )
    )t2  where t2.ord1=1 or t2.ord2=1 group by num
    )t3,
(
select avg(d) as avg
        from
            (
                 select *
                   from default.test_202006
                  where key>=50104160000000 and key<50104240000000
                  union all
                 select * 
                   from default.test_202006
                  where key>=50105000000000 and key<50105160000000
            )
)t4

以上方法分組也都適用。另外還有開窗函數配合FIRST_VALUE()，LAST_VALUE() 也可以嘗試。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hive 求最大最小值均值及對應的一個key鍵(對應行)

C#開源的兩款功能強大的錄屏神器

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

leetcode-根據前序遍歷和中序遍歷重構二叉樹思路與代碼

leetcode-數組的全排列的所有結果思路與代碼

leetcode-滿足連續子數組加和等於目標值的子數組個數思路與代碼

算法面試題：給40億個不重複的unsigned int的整數，沒排過序的，然後再給一個數，如何快速判斷這個數是否在那40億個數當中？

linux下如何用alias給複雜的命令起一個簡單的別名

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結