現在有這樣一份數據:
1,huangxiaoming,45,a-c-d-f
2,huangzitao,36,b-c-d-e
3,huanglei,41,c-d-e
4,liushishi,22,a-d-e
5,liudehua,39,e-f-d
6,liuyifei,35,a-d-e
字段的意義:
id,name,age,favors
id,姓名,年齡,愛好
其中需要注意的是:每一條記錄中的愛好有多個值,以"-"分隔
需求:
求出每種愛好中,年齡最大的兩個人(愛好,年齡,姓名)
注意思考一個問題:如果某個愛好中的第二大年齡有多個相同的怎麼辦?
解題:
建表準備:
create database if not exists exercise;
use exercise;
drop table if exists exercise5;
create table exercise5(id int, name string, age int, favors string) row format delimited fields terminated by ",";
load data local inpath "/home/hadoop/exercise5.txt" into table exercise5;
select * from exercise5;
desc exercise5;
思路分析
需要把這種數據:
6,liuyifei,35,a-d-e
變成:
6,liuyifei,35,a
6,liuyifei,35,d
6,liuyifei,35,e
SQL實現的結果:
select explode(split("a-d-e", "-")); √√√√√√√√
select id,name,age, explode(split(fovors), "-") from exercise5; xxxxxxx
必須要藉助於虛擬視圖技術:
leteral view
改寫:
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor;
求出每種愛好的最大的年齡的最終的SQL:
select aa.favor, max(aa.age) as maxage
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
group by aa.favor;
結果:
a 45
b 36
c 45
d 45
e 41
f 45
但是,如果需求擴展:
兩個需求:
1、你如何幫我把這個年齡的姓名拿出來呢?
2、如果要去每種愛好中的前2名呢?
拓展一下:如果能給每一組中的每個人按照年齡降序排序,然後分配一個組內的序號。那麼將來查詢數據的時候根據序號去查詢將變的非常容易
select aa.favor, aa.age
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa order by aa.favor, aa.age desc;
// 這是需求數據格式,但是上面的SQL語句實現不了
id age rank
a 45 1
a 35 2
a 22 3
b 36 1
c 45 1
c 41 2
c 36 3
d 45 1
d 41 2
d 39 3
d 36 4
d 35 5
d 22 6
e 41 1
e 39 2
e 36 3
e 35 4
e 22 5
f 45 1
f 39 2
如果有上面的數據了,那麼要篩選出每種愛好中的年齡前2名的人的信息,就容易了。
select id, name from table where rank <= 2;
// 利用窗口分析函數,添加序號:
select aa.id, aa.name, aa.age, aa.favor,
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa ;
結果數據:
1 huangbo 45 a 1
6 liuyifei 35 a 2
4 liushishi 22 a 3
2 xuzheng 36 b 1
1 huangbo 45 c 1
3 huanglei 41 c 2
2 xuzheng 36 c 3
1 huangbo 45 d 1
3 huanglei 41 d 2
5 liudehua 39 d 3
2 xuzheng 36 d 4
6 liuyifei 35 d 5
4 liushishi 22 d 6
3 huanglei 41 e 1
5 liudehua 39 e 2
2 xuzheng 36 e 3
6 liuyifei 35 e 4
4 liushishi 22 e 5
1 huangbo 45 f 1
5 liudehua 39 f 2
// 最終SQL的具體實現:利用窗口分析函數去做
select c.favor, c.name, c.age from
(
select aa.id, aa.name, aa.age, aa.favor,
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
) c
where c.index <= 2;
每種愛好中,年齡最大的前2個人:
a huangbo 45
a liuyifei 35
b xuzheng 36
c huangbo 45
c huanglei 41
d huangbo 45
d huanglei 41
e huanglei 41
e liudehua 39
f huangbo 45
f liudehua 39
至此結束!!!!希望大家有所收穫