一個列別名引發的性能問題,兼談產品設計等(2010-11-15)

其實數據庫還是《從一次臨時表空間不足錯誤的處理說起》提到的那個邏輯從庫.
解決那個問題之後,還是比較關注這個數據庫的,看看是否還會報TEMP表空間不足的錯誤.
之後,還是發現IO阻塞,TEMP表空間讀寫是最頻繁的.查找相應時間段裏直接寫最頻繁的SQL語句,又發現下面兩個SQL語句:

sql1:
select b.*, getproducturls(productid) as producturls
  from (select rownum as rowno, a.*
          from (select p.id productid,
                       nvl(p.image1, '') image1,
                       nvl(p.image2, '') image2,
                       nvl(p.image3, '') image3,
                       nvl(p.image4, '') image4,
                       nvl(p.refprice, 0) refprice,
                       nvl(p.catalogname, '') catalogname,
                       nvl(p.brandid, 0) brandid,
                       nvl(brandname, '') brandname,
                       nvl(p.catalogid, 0) catalogid,
                       nvl(p.seriesname, '') seriesname,
                       nvl(p.name, '') name,
                       nvl(p.productname, '') productname,
                       nvl(p.name1, '') name1,
                       nvl(p.name2, '') name2,
                       nvl(dealernum, 0) dealernum,
                       nvl(articlenum, 0) articlenum,
                       nvl(Replace(Replace(p.maininfo, '<li>', ''),
                                   '</li>' || chr(10),
                                   '/'),
                           '') maininfo
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= :1) b
 where rowno >= :2;

sql2:
select b.*, getproducturls(productid) as producturls
  from (select rownum as rowno, a.*
          from (select p.id productid,
                       nvl(p.image1, '') image1,
                       nvl(p.image2, '') image2,
                       nvl(p.image3, '') image3,
                       nvl(p.image4, '') image4,
                       nvl(p.refprice, 0) refprice,
                       nvl(p.catalogname, '') catalogname,
                       nvl(p.brandid, 0) brandid,
                       nvl(brandname, '') brandname,
                       nvl(p.catalogid, 0) catalogid,
                       nvl(p.seriesname, '') seriesname,
                       nvl(p.name, '') name,
                       nvl(p.productname, '') productname,
                       nvl(p.name1, '') name1,
                       nvl(p.name2, '') name2,
                       nvl(dealernum, 0) dealernum,
                       nvl(articlenum, 0) articlenum,
                       nvl(Replace(Replace(p.maininfo, '<li>', ''),
                                   '</li>' || chr(10),
                                   '/'),
                           '') maininfo
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(refprice,0) asc) a
         where rownum <= :1) b
 where rowno >= :2;

這裏不說nvl(p.image1, '') image1 根本就沒有任何的意義,因爲它和p.image1 image1就是一回事兒的問題.

查看當前實際的執行計劃,發現是找不到的,不在共享池中,當時也沒有細想這個問題.
查看出現問題時刻的執行計劃,發現它們都是走全表掃描,而後排序完成的.
查看錶結構,發現確實是缺少相應的索引結構的.
因爲主庫端的日誌應用推遲推遲到了7天,所以沒有在主庫端添加相應的索引結構(因爲主庫端並不執行這樣的語句).而是直接在邏輯從庫端添加了如下的索引結構:
sys> alter session set workarea_size_policy=manual;
sys> alter session set sort_area_size=409600000;
sys> create index productuser.ind60_product on productuser.product(PUBLISHSTATUS,PRIORITY,NVL(PAGEVIEW,0)) online;
sys> create index productuser.ind61_product on productuser.product(publishstatus,priority desc,nvl(refprice,0)) online;

發現SQL1可以走index_desc(p ind60_product)從而避免排序,但sql2就是避免不了排序,默認是全表掃描而後排序完成的,即使加提示:index(p ind61_product)也是需要排序操作的.
因爲已經很晚了,就回家了.
第二天上班的時候,發現昨天晚上的IO還是很高,而且還是TEMP讀寫是最頻繁的,抓取相應的sql,發現還是這兩條SQL.
對於sql2,這裏先不討論.
對於sql1,就讓人有些搞不明白了,昨天是走索引從而避免排序的呀,可查看出現問題時間段的執行計劃,發現居然是走全表掃描,而後排序完成的.當前呢?也是走索引避免排序操作的呀.
查看問題時間段的綁定變量值,發現如下的值:

NAME       POSITION VALUE_STRING
---------- -------- --------------------
:1                1 291720
:2                2 291701
:1                1 300520
:2                2 300501
:1                1 294500
:2                2 294481
:1                1 20
:2                2 1

之後,我再查看當前的執行計劃,這個sql1又找不到了,不在共享池中了.
可以判斷這個sql1重複的硬分析,在出現問題之前,發生硬分析,peeking到的值剛好是291720這樣很靠後的數據行,這時便走了全表掃描,而後排序的操作,實際上對於很靠後的數據行,不走索引是對的.

現在的問題是:爲什麼重複硬分析,爲什麼綁定變量值是這麼大的值呢?
其實查看當前的共享池:
select plan_hash_value,count(1) cnt from v$sql group by plan_hash_value having count(1)>=100;
之後查看使用某一個執行計劃的具體的sql語句,發現所有的語句都是使用字面值的,不僅僅是字面值有所不同,其它的地方也是不完全相同的.明顯這些sql都是拼裝出來的.
其實看看應用頁面,這一點是最清楚的。應用頁面上就是讓用戶選定限定條件,而後執行查詢的.所以應用裏是根據用戶選定的條件拼裝的sql語句,除了頁數這裏使用了綁定變量之外,其它的地方都沒有使用綁定變量.
問題時間段的awr報表,因爲數據庫阻塞,數據庫幾乎不提供服務了,所以硬分析不是很高,在正常時間段,每秒鐘的硬分析都是幾十的.
這個問題算是定位了:因爲拼裝SQL,共享池中充滿了沒有使用綁定變量的sql語句,這樣總體執行不是很頻繁(但某個時間段很頻繁)的sql1很快被擠出了共享池,下次執行的時候,reload,重新硬分析,peeking到的剛好是很靠後的頁面,這時就選擇了全表掃描而後排序的方式來完成操作.

那爲什麼會綁定這麼大的數據值呢?也就是說爲什麼會選擇這麼靠後的頁面呢?
其實看看應用頁面也是很明顯的,提供了最後一頁,跳轉到某個頁面的功能,還有確切的搜索結果的數據行數.
我個人感覺如果搜索返回的結果集很大的話,最後一頁,跳轉到某個頁面這樣的功能其實是不應該提供的,其實你看看baidu,google的搜索頁面,頂多讓你10個頁面10個頁面的翻轉,根本就沒有最後一頁,跳轉到某個指定的頁面的功能.如果搜索到的頁面過多的話,客戶跳轉到某個頁面,或者是最後一頁,其實也是在碰運氣,看看能不能找到想要的東西,但這樣找到的概率有多大,幾乎是微乎其微的,所以如果搜索到的頁面過多,就應該設定更精確的搜索條件而後重新搜索的.我要找價格最低的,我卻按價格降序排序,然後從最後一個頁面開始往前翻,這不是很愚蠢嗎?
其實說到搜索結果的數據行數,百度根本就沒有提供這樣的數據,google倒是提供了,但明顯不是確切的結果,因爲提供確切的結果數據行數是沒有意義的,因爲數據更新太快了,可能等你翻轉到第二頁的時候,數據行數已經改變了.因爲實際上提供確切的數據行數的執行代價可能不亞於你查看某個頁面的代價,特別是如果數據更新速度很快的話提供確切的數據行數是沒有意義的,所以我感覺如果要提供搜索結果的數據行數的話,也應該是一個執行代價很低的估算的方法,至於如何估算倒是沒有細想過.
當然這些東西,我可能更多的還是從性能的角度來考慮的,所以也許這些觀點有待商榷.

和開發人員交流,才明白其實不僅僅是拼裝sql的問題,這個問題確實是不好解決,而且最要命的是這個應用的訪問被嵌入到了幾百甚至上千的其它網站的頁面代碼裏,很可能是當初寫死了(很可能取了很靠後的頁面).而具體是哪些網站,嵌入的具體代碼是什麼更加不好確定,其實想想這個應用在最初設計的時候存在的問題太多了.

這個問題先說到這裏.
至於sql2無論如何都避免不了排序的問題.最初我感覺可能和這個索引中使用了nvl這樣的函數有關係,因爲以前確實遇到過這樣的情形,在索引中使用了函數之後就是避免不了排序,可在其它數據庫上使用dba_objects這樣的表加載數據到測試表中,測試使用nvl,order by這樣的要求幾乎是一樣的,是沒有問題,可以使用索引避免排序的,在這臺數據庫上使用這樣的表測試也是沒有問題的,這就奇怪了,看來只能做一下10053事件了,
其實看看10053的跟蹤文件,還是很明顯的:
******* UNPARSED QUERY IS *******
 ORDER BY "P"."PRIORITY" DESC,NVL(NVL("P"."REFPRICE",0),0)

而原來的語句中是:
order by p.priority desc, nvl(refprice,0) asc

select部分中是:nvl(p.refprice, 0) refprice
所以,原來語句中的order by部分的refprice實際上引用的是這裏的別名refprice,也就是nvl(p.refprice, 0),而不是product p的refprice列,所以最終替換後解析成了
order by p.priority desc,nvl(nvl(p.refprice,0),0)
而這裏已經和索引定義中的nvl(refprice,0),也就是user_tab_cols中顯示的隱藏列的定義不同了.(雖然說本質上還是一樣的,但顯然優化器還沒有智能到那樣的地步,它沒有意識到這一點),所以這個索引避免不了排序操作.

明白了這些,其實解決起來就很簡單了,將order by部分改寫爲:
order by p.priority desc, nvl(p.refprice,0) asc
或者是
order by p.priority desc, refprice asc
之後,就可以使用ind61_product避免排序操作了.
所以,寫SQL的時候,你需要明白你引用的東西到底是什麼,這一點是需要清楚的

這個使用索引避免不了排序的問題就算是解決了.當然如果它這裏也出現sql1那樣取很靠後的頁面的情況的話,還是會使用全表掃描而後排序這樣的操作來完成的,這是避免不了的,但實際上監控卻沒有發現這樣的問題.

其實說到這個應用的真正的解決方案,和應用設計有關係,需要應用上做修改的。
這裏不提應用設計上的改變,如果單從SQL修改的角度來看:
933>select /*+ ordered use_nl(a p)*/
       rowno,
       p.id productid,
       nvl(p.image1, '') image1,
       nvl(p.image2, '') image2,
       nvl(p.image3, '') image3,
       nvl(p.image4, '') image4,
       nvl(p.refprice, 0) refprice,
       nvl(p.catalogname, '') catalogname,
       nvl(p.brandid, 0) brandid,
       nvl(brandname, '') brandname,
       nvl(p.catalogid, 0) catalogid,
       nvl(p.seriesname, '') seriesname,
       nvl(p.name, '') name,
       nvl(p.productname, '') productname,
       nvl(p.name1, '') name1,
       nvl(p.name2, '') name2,
       nvl(dealernum, 0) dealernum,
       nvl(articlenum, 0) articlenum,
       getproducturls(p.id) as producturls
from
(
select rid,rowno
  from (select rownum as rowno, rid
          from (select  /*+ full(p) */rowid rid
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 300000) b
 where rowno >= 299981
) a,product p
where a.rid=p.rowid
order by rowno;

Execution Plan
----------------------------------------------------------
Plan hash value: 3274237678

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   103K|    47M|       |   121K  (1)| 00:24:18 |
|   1 |  SORT ORDER BY               |         |   103K|    47M|   107M|   121K  (1)| 00:24:18 |
|   2 |   NESTED LOOPS               |         |   103K|    47M|       |   110K  (1)| 00:22:11 |
|*  3 |    VIEW                      |         |   103K|  2517K|       |  7706   (1)| 00:01:33 |
|*  4 |     COUNT STOPKEY            |         |       |       |       |            |          |
|   5 |      VIEW                    |         |   103K|  1208K|       |  7706   (1)| 00:01:33 |
|*  6 |       SORT ORDER BY STOPKEY  |         |   103K|  2114K|  7320K|  7706   (1)| 00:01:33 |
|*  7 |        TABLE ACCESS FULL     | PRODUCT |   103K|  2114K|       |  7035   (1)| 00:01:25 |
|   8 |    TABLE ACCESS BY USER ROWID| PRODUCT |     1 |   460 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ROWNO">=299981)
   4 - filter(ROWNUM<=300000)
   6 - filter(ROWNUM<=300000)
   7 - filter("P"."PUBLISHSTATUS"=3)


Statistics
----------------------------------------------------------
         43  recursive calls
          0  db block gets
      40311  consistent gets
          0  physical reads
          0  redo size
      23002  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         20  rows processed

996>select b.*, getproducturls(productid) as producturls
  from (select rownum as rowno, a.*
          from (select p.id productid,
                       nvl(p.image1, '') image1,
                       nvl(p.image2, '') image2,
                       nvl(p.image3, '') image3,
                       nvl(p.image4, '') image4,
                       nvl(p.refprice, 0) refprice,
                       nvl(p.catalogname, '') catalogname,
                       nvl(p.brandid, 0) brandid,
                       nvl(brandname, '') brandname,
                       nvl(p.catalogid, 0) catalogid,
                       nvl(p.seriesname, '') seriesname,
                       nvl(p.name, '') name,
                       nvl(p.productname, '') productname,
                       nvl(p.name1, '') name1,
                       nvl(p.name2, '') name2,
                       nvl(dealernum, 0) dealernum,
                       nvl(articlenum, 0) articlenum
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 300000) b
 where rowno >= 299981;

Execution Plan
----------------------------------------------------------
Plan hash value: 4064044132

--------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |   103K|    92M|       | 17038   (1)| 00:03:25 |
|*  1 |  VIEW                    |         |   103K|    92M|       | 17038   (1)| 00:03:25 |
|*  2 |   COUNT STOPKEY          |         |       |       |       |            |          |
|   3 |    VIEW                  |         |   103K|    90M|       | 17038   (1)| 00:03:25 |
|*  4 |     SORT ORDER BY STOPKEY|         |   103K|    44M|   100M| 17038   (1)| 00:03:25 |
|*  5 |      TABLE ACCESS FULL   | PRODUCT |   103K|    44M|       |  7035   (1)| 00:01:25 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ROWNO">=299981)
   2 - filter(ROWNUM<=300000)
   4 - filter(ROWNUM<=300000)
   5 - filter("P"."PUBLISHSTATUS"=3)


Statistics
----------------------------------------------------------
         43  recursive calls
          0  db block gets
      40291  consistent gets
          0  physical reads
          0  redo size
      23002  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         20  rows processed

我們查看一下這兩個會話的PGA使用情況:
select s.sid,p.PGA_USED_MEM,p.PGA_ALLOC_MEM,p.PGA_FREEABLE_MEM,p.PGA_MAX_MEM
from v$process p,v$session s
where p.ADDR=s.PADDR and s.sid in(933,996);

       SID PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM PGA_MAX_MEM
---------- ------------ ------------- ---------------- -----------
       996      5520085      10150485                0   154346741
       933      5322477       9512181            65536    20915445

933的會話中,我爲了和996的會話進行對比,我強制走了全表掃描,排序字段沒有變,和996會話是一樣的,但只取出了rowid字段,然後再關聯product表取其它需要的字段.
933會話比996會話多出的20的邏輯IO是因爲根據rowid關聯取product表的其它字段的時候多出的邏輯IO,取20條記錄,剛好就是20的邏輯IO.
933會話比996會話多出的1次內存排序是我最後對取出的20條記錄按照rowno排序多出的一次排序,基本上沒有什麼排序量的.
但933比996在最大PGA內存使用上少出很多來,933只有20M,而996都達到了150M.排序字段是相同的,但933會話中引用到的其它字段比996少了很多,這裏只引用到了rowid,沒有取其它字段,所以對PGA內存要求少了很多.這樣同樣的業務調用,通過rowid關聯再取數據的sql,就需要更大的併發量纔會讀寫臨時表空間來完成排序操作的.其實我這裏就是想要表明不僅order by字段,select裏的字段也影響到了workarea區域的使用的.

下面是通過rowid關聯取數據,但不強制全表掃描,而是走索引避免排序的情形(是默認的選擇):
965>select /*+ ordered use_nl(a p)*/
       rowno,
       p.id productid,
       nvl(p.image1, '') image1,
       nvl(p.image2, '') image2,
       nvl(p.image3, '') image3,
       nvl(p.image4, '') image4,
       nvl(p.refprice, 0) refprice,
       nvl(p.catalogname, '') catalogname,
       nvl(p.brandid, 0) brandid,
       nvl(brandname, '') brandname,
       nvl(p.catalogid, 0) catalogid,
       nvl(p.seriesname, '') seriesname,
       nvl(p.name, '') name,
       nvl(p.productname, '') productname,
       nvl(p.name1, '') name1,
       nvl(p.name2, '') name2,
       nvl(dealernum, 0) dealernum,
       nvl(articlenum, 0) articlenum,
       getproducturls(p.id) as producturls
from
(
select rid,rowno
  from (select rownum as rowno, rid
          from (select rowid rid
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 300000) b
 where rowno >= 299981
) a,product p
where a.rid=p.rowid
order by rowno;

Execution Plan
----------------------------------------------------------
Plan hash value: 1874480058

----------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |   103K|    47M|       |   113K  (1)| 00:22:48 |
|   1 |  SORT ORDER BY                   |               |   103K|    47M|   107M|   113K  (1)| 00:22:48 |
|   2 |   NESTED LOOPS                   |               |   103K|    47M|       |   103K  (1)| 00:20:41 |
|*  3 |    VIEW                          |               |   103K|  2517K|       |   187   (0)| 00:00:03 |
|*  4 |     COUNT STOPKEY                |               |       |       |       |            |          |
|   5 |      VIEW                        |               |   103K|  1208K|       |   187   (0)| 00:00:03 |
|*  6 |       INDEX RANGE SCAN DESCENDING| IND60_PRODUCT |   103K|  2114K|       |   187   (0)| 00:00:03 |
|   7 |    TABLE ACCESS BY USER ROWID    | PRODUCT       |     1 |   460 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ROWNO">=299981)
   4 - filter(ROWNUM<=300000)
   6 - access("P"."PUBLISHSTATUS"=3)


Statistics
----------------------------------------------------------
         66  recursive calls
          0  db block gets
       1252  consistent gets
          0  physical reads
          0  redo size
      25089  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         20  rows processed

這裏的排序是最後order by rowno造成的操作,order by p.priority desc, nvl(pageview, 0) desc並沒有排序操作,而是通過索引訪問來完成的.

看一下這個會話的PGA使用:
select s.sid,p.PGA_USED_MEM,p.PGA_ALLOC_MEM,p.PGA_FREEABLE_MEM,p.PGA_MAX_MEM
from v$process p,v$session s
where p.ADDR=s.PADDR and s.sid in(965);

       SID PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM PGA_MAX_MEM
---------- ------------ ------------- ---------------- -----------
       965      5728773      10495221                0    10495221

這裏的pga使用基本上就是uga的使用了,就是用來保存會話信息的,這裏的使用是不可避免的.

1048>select b.*, getproducturls(productid) as producturls
  from (select rownum as rowno, a.*
          from (select /*+ index_desc(p ind60_product) */p.id productid,
                       nvl(p.image1, '') image1,
                       nvl(p.image2, '') image2,
                       nvl(p.image3, '') image3,
                       nvl(p.image4, '') image4,
                       nvl(p.refprice, 0) refprice,
                       nvl(p.catalogname, '') catalogname,
                       nvl(p.brandid, 0) brandid,
                       nvl(brandname, '') brandname,
                       nvl(p.catalogid, 0) catalogid,
                       nvl(p.seriesname, '') seriesname,
                       nvl(p.name, '') name,
                       nvl(p.productname, '') productname,
                       nvl(p.name1, '') name1,
                       nvl(p.name2, '') name2,
                       nvl(dealernum, 0) dealernum,
                       nvl(articlenum, 0) articlenum
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 300000) b
 where rowno >= 299981;

Execution Plan
----------------------------------------------------------
Plan hash value: 2767923102

-------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |   103K|    92M| 24050   (1)| 00:04:49 |
|*  1 |  VIEW                           |               |   103K|    92M| 24050   (1)| 00:04:49 |
|*  2 |   COUNT STOPKEY                 |               |       |       |            |          |
|   3 |    VIEW                         |               |   103K|    90M| 24050   (1)| 00:04:49 |
|   4 |     TABLE ACCESS BY INDEX ROWID | PRODUCT       |   103K|    44M| 24050   (1)| 00:04:49 |
|*  5 |      INDEX RANGE SCAN DESCENDING| IND60_PRODUCT |   103K|       |   187   (0)| 00:00:03 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ROWNO">=299981)
   2 - filter(ROWNUM<=300000)
   5 - access("P"."PUBLISHSTATUS"=3)


Statistics
----------------------------------------------------------
         43  recursive calls
          0  db block gets
     143772  consistent gets
          0  physical reads
          0  redo size
      25089  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

select s.sid,p.PGA_USED_MEM,p.PGA_ALLOC_MEM,p.PGA_FREEABLE_MEM,p.PGA_MAX_MEM
from v$process p,v$session s
where p.ADDR=s.PADDR and s.sid in(1048);

       SID PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM PGA_MAX_MEM
---------- ------------ ------------- ---------------- -----------
      1048      5134213       9905397           262144     9970933

原來的sql語句強制走索引避免排序操作的話,邏輯IO很高,這裏很高的邏輯IO主要是因爲不必要的回訪表操作造成的.
確實沒有排序操作,但邏輯IO太高,還是會消耗大量的cpu,大量併發的話,必然導致hash latch爭用,不斷的spin操作消耗CPU,如果sql語句改成這樣的話,併發下的必然結果是CPU沒有空閒的.

綜合比對:
select /*+ ordered use_nl(a p)*/
       rowno,
       p.id productid,
       nvl(p.image1, '') image1,
       nvl(p.image2, '') image2,
       nvl(p.image3, '') image3,
       nvl(p.image4, '') image4,
       nvl(p.refprice, 0) refprice,
       nvl(p.catalogname, '') catalogname,
       nvl(p.brandid, 0) brandid,
       nvl(brandname, '') brandname,
       nvl(p.catalogid, 0) catalogid,
       nvl(p.seriesname, '') seriesname,
       nvl(p.name, '') name,
       nvl(p.productname, '') productname,
       nvl(p.name1, '') name1,
       nvl(p.name2, '') name2,
       nvl(dealernum, 0) dealernum,
       nvl(articlenum, 0) articlenum,
       getproducturls(p.id) as producturls
from
(
select rid,rowno
  from (select rownum as rowno, rid
          from (select rowid rid
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 300000) b
 where rowno >= 299981
) a,product p
where a.rid=p.rowid
order by rowno;

這個是最優的,不需要排序,沒有不必要的回訪表,邏輯io也不大.

對於取靠前的數據行:
select /*+ ordered use_nl(a p)*/
       rowno,
       p.id productid,
       nvl(p.image1, '') image1,
       nvl(p.image2, '') image2,
       nvl(p.image3, '') image3,
       nvl(p.image4, '') image4,
       nvl(p.refprice, 0) refprice,
       nvl(p.catalogname, '') catalogname,
       nvl(p.brandid, 0) brandid,
       nvl(brandname, '') brandname,
       nvl(p.catalogid, 0) catalogid,
       nvl(p.seriesname, '') seriesname,
       nvl(p.name, '') name,
       nvl(p.productname, '') productname,
       nvl(p.name1, '') name1,
       nvl(p.name2, '') name2,
       nvl(dealernum, 0) dealernum,
       nvl(articlenum, 0) articlenum,
       getproducturls(p.id) as producturls
from
(
select rid,rowno
  from (select rownum as rowno, rid
          from (select rowid rid
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 100) b
 where rowno >= 81
) a,product p
where a.rid=p.rowid
order by rowno;

 

select b.*, getproducturls(productid) as producturls
  from (select rownum as rowno, a.*
          from (select p.id productid,
                       nvl(p.image1, '') image1,
                       nvl(p.image2, '') image2,
                       nvl(p.image3, '') image3,
                       nvl(p.image4, '') image4,
                       nvl(p.refprice, 0) refprice,
                       nvl(p.catalogname, '') catalogname,
                       nvl(p.brandid, 0) brandid,
                       nvl(brandname, '') brandname,
                       nvl(p.catalogid, 0) catalogid,
                       nvl(p.seriesname, '') seriesname,
                       nvl(p.name, '') name,
                       nvl(p.productname, '') productname,
                       nvl(p.name1, '') name1,
                       nvl(p.name2, '') name2,
                       nvl(dealernum, 0) dealernum,
                       nvl(articlenum, 0) articlenum
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= 100) b
 where rowno >= 81;

兩者都是通過index_desc(p ind60_product)來避免排序操作的,前者邏輯IO:164,後者邏輯io:245,因爲後者還是不必要的回訪了前80行的數據.
如果是1,20的話,兩者的邏輯IO是相同的,都是164.

也就是說對於取靠前的數據行,/*+ ordered use_nl(a p)*/的方式也不輸,甚至更優一些.
所以在目前不改變業務設計,還是要取很靠後的數據行的話,要修改sql的話,應該改寫成這樣:
select /*+ ordered use_nl(a p)*/
       rowno,
       p.id productid,
       nvl(p.image1, '') image1,
       nvl(p.image2, '') image2,
       nvl(p.image3, '') image3,
       nvl(p.image4, '') image4,
       nvl(p.refprice, 0) refprice,
       nvl(p.catalogname, '') catalogname,
       nvl(p.brandid, 0) brandid,
       nvl(brandname, '') brandname,
       nvl(p.catalogid, 0) catalogid,
       nvl(p.seriesname, '') seriesname,
       nvl(p.name, '') name,
       nvl(p.productname, '') productname,
       nvl(p.name1, '') name1,
       nvl(p.name2, '') name2,
       nvl(dealernum, 0) dealernum,
       nvl(articlenum, 0) articlenum,
       getproducturls(p.id) as producturls
from
(
select rid,rowno
  from (select rownum as rowno, rid
          from (select rowid rid
                  from product p
                 where 1 = 1
                   and p.publishstatus = 3
                 order by p.priority desc, nvl(pageview, 0) desc) a
         where rownum <= :1) b
 where rowno >= :2
) a,product p
where a.rid=p.rowid
order by rowno;

對於目前的現狀,在應用修改之前,針對當前的內存使用情況:
sga:4300M,當前,db_cache_size=1600m,shared_pool_size=2500m,這是當前的值,數據庫端並沒有對它們做下限值的限制.
pga:2000m

做出如下的調整:
db_cache先不做修改,awr報表沒有反映這個設置方面的問題,對於共享池,針對當前拼裝sql,重用可能性不大的現狀,調整爲1500m(減小了1000M),也就是說db_cache_size=1600m,shared_pool_size=1500m,sga_target=sga_max_size=3300m(實際上做出了這樣的下限值限定之後,共享池調整的餘地已經很小了,基本上可以確定就是1500M可能稍微多一點了),把節省出來的1000M內存分配給了pga,pga_aggregate=3000m.
同時,shared_pool_size大小的減小,默認情況下會導致LCR CACHE的減小,如果它太小的話,容易報錯ORA-04031,導致lsp進程終止的.
所以必須保證lcr cache的大小的:
EXECUTE DBMS_LOGSTDBY.APPLY_SET('MAX_SGA',350);
將lcr cache固定爲350M.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章