日常工作中,我們常常需要開發報表或者統計一些數據的百分比、合計,這時候,下面的幾個函數可以很快地解決問題 。它們不僅sql語句少,而且性能更好。下圖即爲我們這次要統計的結果展示。
常用的函數如下:case when...then...else...end, regexp_like, ratio_to_report(score) OVER(), rollup, grouping,本文只是展示在sql中如何用這些函數巧妙地統計出這種效果,具體用法可自查閱資料瞭解更多細節。
首先,需要你統計的東西,類似這樣子:
本人的表由於數據需要轉換,所以需要使用下面的sql語句進行處理。如有不同,請跳過這一段。
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
接下來,你統計的數據展示可以再次轉換爲這樣,即這一步可以求百分比:
本人的數據需要轉換,所以需要使用下面的sql語句進行處理。其實就是在這一步用ratio_to_report(aa.score) OVER()處理。如有不同,請跳過這一段。
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
接下來,就需要我們分組合計了,這個時候,rollup派上用場了。
使用了rollup後,我們的數據展示變成了:
下面的sql語句就是爲了展示rollup使用後的效果,並非最後的效果,可以跳過這一段:
SELECT region, SUM(score) AS score, SUM(percents) AS percents
FROM (
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
) t_test
GROUP BY region WITH ROLLUP;
當我們使用了rollup後,我們發現,最下面的合計這一行,要進行合計的分類字段region最下面的值是爲空的,所以,配合grouping這個函數使用,即將可以達到我們最後想要的效果。
grouping函數可以接受一列,返回0或者1。如果列值爲空,那麼grouping()返回1;如果列值非空,那麼返回0。grouping只能在使用rollup或cube的查詢中使用。當需要在返回空值的地方顯示某個值時,grouping()就非常有用。
下圖是最終的sql語句:
SELECT CASE
WHEN grouping(region) = 1 THEN '合計'
ELSE region
END AS region, SUM(score) AS score, SUM(percents) AS percents
FROM (
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
) t_test
GROUP BY region WITH ROLLUP;
以上就是我們數據最後的展示效果。sql語句可以進一步優化,由於時間問題,後續有時間進一步優化。