經常會遇到取一組數據分組後最大(小)值的行,以前常用Rank 和Partition by,我想了下換個方法其實還可以,沒有測試性能如何.
create table test
(
col1 number,
col2 varchar2(20),
col3 number
);
insert into test
select 1,'content',2 from dual;
insert into test
select 1,'content2',3 from dual;
insert into test
select 1,'content2',4 from dual;
insert into test
select 2,'content',1 from dual;
insert into test
select 2,'content2',30 from dual;
insert into test
select 2,'content2',4 from dual;
insert into test
select 3,'content',10 from dual;
insert into test
select 3,'content2',3 from dual;
insert into test
select 3,'content2',4 from dual;
commit ;
rank---partition by的寫法
select * from
(
select rank() over (partition by col1 order by col3 desc) rn,
a.*
from test a
)X
where rn=1;
現在使用max函數也行
select * from test a
where col3 in
(
select max(col3) from test b
where a.col1=b.col1
);
不知道對於大數據量到底如何測試下數據中一個表
select count(*) from form_action_log
COUNT(*) 9903874
用Rank
select count(*) from
(
select rank() over (partition by a.form_id order by a.action_time desc) rn,
a.*
from form_action_log a
)X
where rn=1;
------70.125s result COUNT(*) 4248095
用MAX1
select count(*) from form_action_log a
where a.action_time in
(
select max(b.action_time) from form_action_log b
where a.form_id=b.form_id
);
---326.000s COUNT(*) 4248095
用MAX2
select count(*) from form_action_log a
where a.action_time>=
(
select max(b.action_time) from form_action_log b
where a.form_id=b.form_id
);
<60s COUNT(*) 4248095
不過奇怪 看到執行計劃上面用Rank的方法比用Max1,Cost/IO Cost要大 執行time多久,
但實際結果時間好像不一樣.(PL/Sql developer7.0上測試的結果)
改寫了下SQL MAX2快一點了
以後再想想怎麼改寫吧.......
就到這裏了