有以下一份數據:
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11
數據的字段定義是:
name,month,pv
數據的字段意義是:
用戶,月份,訪問量
現在來看需求:
每個用戶截止到每月爲止的最大單月訪問次數和累計到該月的總訪問次數
結果如下:
用戶 月份 當月訪問次數 最大訪問次數 總訪問次數
A 2015-01 33 33 33
A 2015-02 10 33 43
A 2015-03 38 38 81
B 2015-01 30 30 30
B 2015-02 15 30 45
B 2015-03 44 44 89
具體的最終實現的SQL:
select a.name as aname, a.month as amonth, a.pv as apv,
max(b.pv) as maxpv, sum(b.pv) as sumpv
from
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a
join
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b
on a.name = b.name
where a.month >= b.month
group by a.name, a.month, a.pv;
實現思路:
第一步:由於每個用戶在每個月份有多條數據訪問記錄,所以根據題意,首先得彙總每個用戶在每個月份的總訪問次數
SQL實現:
select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month;
結果數據:
A 2015-01 33
A 2015-02 10
A 2015-03 38
B 2015-01 30
B 2015-02 15
B 2015-03 44
第二步:由於要求得
用戶 月份 當月訪問次數 最大訪問次數 總訪問次數
A 2015-01 33 33 33
A 2015-02 10 33 43
A 2015-03 38 38 81
B 2015-01 30 30 30
B 2015-02 15 30 45
B 2015-03 44 44 89
這種格式的數據,需要如下這種格式的數據才能求出:
A 2015-01 33 A 2015-01 33
A 2015-02 10 A 2015-01 33
A 2015-02 10 A 2015-02 10
A 2015-03 38 A 2015-01 33
A 2015-03 38 A 2015-02 10
A 2015-03 38 A 2015-03 38
B 2015-01 30 B 2015-01 30
B 2015-02 15 B 2015-01 30
B 2015-02 15 B 2015-02 15
B 2015-03 44 B 2015-01 30
B 2015-03 44 B 2015-02 15
B 2015-03 44 B 2015-03 44
那如何得到這樣的數據呢?執行如下的SQL:
select a.name as aname, a.month as amonth, a.pv as apv,
b.name as bname, b.month as bmonth, b.pv as bpv
from
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a
join
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b
on a.name = b.name
where a.month >= b.month;
第三步:在得到上述數據的基礎之上,然後直接進行聚合即可
SQL實現:
select a.aname, a.amonth, a.apv, max(a.bpv) as maxpv, sum(a.bpv) as sumpv
from
(
select a.name as aname, a.month as amonth, a.pv as apv,
b.name as bname, b.month as bmonth, b.pv as bpv
from
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a
join
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b
on a.name = b.name
where a.month >= b.month
) a
group by a.aname, a.amonth, a.apv;
第四步:執行結果
A 2015-01 33 33 33
A 2015-02 10 33 43
A 2015-03 38 38 81
B 2015-01 30 30 30
B 2015-02 15 30 45
B 2015-03 44 44 89
第五步:對於SQL語句進行一些優化得到最終SQL:
select a.name as aname, a.month as amonth, a.pv as apv,
max(b.pv) as maxpv, sum(b.pv) as sumpv
from
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a
join
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b
on a.name = b.name
where a.month >= b.month
group by a.name, a.month, a.pv;
總結:
爲了實現需求,我們從結果數據進行推導,有什麼樣的基礎數據寫什麼樣的SQL語句就能得到什麼的結果數據。這是指導我麼寫SQL的最基本的方式。
在本題中,我們使用結果數據,推導需要什麼樣的基礎數據,在構建基礎數據的時候。我們使用了自連接。
這裏還有一個類似的題目,大家可以嘗試一下
數據:
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
數據的字段定義是:
shop,month,money
數據的字段意義是:
店鋪,月份,營業額
需求:
需求:編寫Hive的HQL語句求出每個店鋪的當月銷售額和累計到當月的總銷售額