關於優化RAC檢查的SQL_20200512

第一部分:度量系統開銷
腳本: top_level_waits.sql
rem **
rem
rem File: top_level_waits.sql
rem Description: Break down of top level WAITCLASS waits
rem
rem 描述:集羣頂級等待事件
rem
rem
rem
rem

col time_cat format a20 heading "Time category"
col time_secs format 999,999.99 "Time (s)"
col pct format 99.99 "Time|pct"
set pagesize 10000
set lines 80
set echo on

SELECT wait_class time_cat, ROUND((time_secs), 2) time_secs,
ROUND((time_secs) * 100 / SUM(time_secs) OVER (), 2) pct
FROM (SELECT wait_class wait_class,
SUM(time_waited_micro) / 1000000 time_secs
FROM gv$system_event
WHERE wait_class <> 'Idle' AND time_waited > 0
GROUP BY wait_class
UNION
SELECT 'CPU', ROUND((SUM(VALUE) / 1000000), 2) time_secs
FROM gv$sys_time_model
WHERE stat_name IN ('background cpu time', 'DB CPU'))
ORDER BY time_secs DESC;

腳本說明:
集羣等待時間佔總數據庫時間比例大於10%~20%需要DBA介入調查
Cluster PCT 大於 10%~20%

腳本:cluster_waits.sql

rem **
rem
rem File: cluster_waits.sql
rem 描述: Break out of cluster waits compared to other categories
rem
rem
rem
rem

column wait_type format a35 heading "Wait Type"
column lock_name format a12
column waits_1000 format 99,999,999 heading "Waits|\1000"
column time_waited_hours format 99,999.99 heading "Time|Hours"
column pct_time format 99.99 Heading "Pct of|Time"
column avg_wait_ms format 9,999.99 heading "Avg Wait|Ms"
set pagesize 10000
set lines 100
set echo on

WITH system_event AS
(SELECT CASE
WHEN wait_class = 'Cluster' THEN event
ELSE wait_class
END wait_type, e.
FROM gv$system_event e)
SELECT wait_type, ROUND(total_waits/1000,2) waits_1000 ,
ROUND(time_waited_micro/1000000/3600,2) time_waited_hours,
ROUND(time_waited_micro/1000/total_waits,2) avg_wait_ms ,
ROUND(time_waited_micro
100
/SUM(time_waited_micro) OVER(),2) pct_time
FROM (SELECT wait_type, SUM(total_waits) total_waits,
SUM(time_waited_micro) time_waited_micro
FROM system_event e
GROUP BY wait_type
UNION
SELECT 'CPU', NULL, SUM(VALUE)
FROM gv$sys_time_model
WHERE stat_name IN ('background cpu time', 'DB CPU'))
WHERE wait_type <> 'Idle'
ORDER BY time_waited_micro DESC;

說明:由於集羣的等待通常大部分直接由全局緩存請求等待組成,但是更多災難性的全局緩存出現也是比較尋常,例如缺失,阻塞,全局緩存告訴緩存忙等待。上邊這個腳本說明了通常展現的情況
查詢出現gc cr/current block 2-way等gc等待事件需要dba介入調查。以下並沒有出現gc等待事件

第二部分:減少全局緩存延遲
腳本:gc_waits.sql
col event format a30 heading "Wait event"
col total_waits format 999,999,999 heading "Total|Waits"
col time_waited_secs format 999,999,999 heading "Time|(secs)"
col avg_ms format 9,999.99 heading "Avg Wait|(ms)"
set pagesize 1000
set lines 80
set echo on

SELECT event, SUM(total_waits) total_waits,
ROUND(SUM(time_waited_micro) / 1000000, 2)
time_waited_secs,
ROUND(SUM(time_waited_micro)/1000 /
SUM(total_waits), 2) avg_ms
FROM gv$system_event
WHERE wait_class <> 'Idle'
AND( event LIKE 'gc%block%way'
OR event LIKE 'gc%multi%'
or event like 'gc%grant%'
OR event = 'db file sequential read')
GROUP BY event
HAVING SUM(total_waits) > 0
ORDER BY event;

說明:全局緩存一致讀請求如(gc cr block 2-way 等) 平均超過了1ms 並且超過了數據文件順序讀的1/10的時間

第三部分:優化系統內部互聯

腳本名: ksxpia.sql
rem **
rem
rem File: ksxpia.sql
rem Description: Private interconnect IP address
rem
rem
rem

col instance_number format 999 heading "Inst|#"
col host_name format a25 heading "Host|Name"
col network_interface format a5 heading "Net|IFace"
col private_ip format a12 heading "Private|IP"
set pages 1000
set echo on

SELECT instance_number, host_name, instance_name,
name_ksxpia network_interface, ip_ksxpia private_ip
FROM x$ksxpia
CROSS JOIN
v$instance
WHERE pub_ksxpia = 'N';

查詢結果:sys用戶

通過ping來查詢rac之間的內部互聯,從下圖可以看到 odsdb2到odsdb1消耗了0.303毫秒

說明:在部署數據庫的時候,如果延遲過高,考慮下這部分主要問題是將公用網絡配置成了內部互聯網絡

內部互聯問題的信號
以下腳本用來查詢通過對比發送塊,和接收塊的數目顯示丟失的塊數
腳本:gc_miss_rate.sql

col value format 999,999,999,999
col name format a30
set echo on

SELECT name, SUM(VALUE) value
FROM gv$sysstat
WHERE name LIKE 'gc%lost'
OR name LIKE 'gc%received'
OR name LIKE 'gc%served'
GROUP BY name
ORDER BY name;

說明:
等待快丟失重傳花費的時間記錄在 gc cr request retry、gc cr block lost 和gc current block lost 上,這些等待事件關聯的時間應該很低,與記錄gc cr/current blocks received/served統計數據裏的總塊數比較,
通常要小於總數量的1%。
如果有很高的的塊丟失,或者與塊丟失的相關時間跟整個數據塊時間比起來顯得很顯著,最有可能是硬件的問題,例如網卡沒有安裝好,網線折斷,不合格的網絡設備。
適度的塊丟失,可能是內部互聯負載過大。

第四部分 LMS等待

內部互聯性是全局緩存延遲的核心,但是搞得全局緩存延遲是經常是oracle軟件層次延遲的結果,遠程實例lms服務貢獻了全局緩存請求的大部分非網絡延遲,它負責構建和返回請求的塊,,下面查詢了每個實例的當前度和一致性請求的LMS延時。
腳本:lms_latency.sql
rem **
rem
rem File: lms_latency.sql
rem Description: LMS latency breakdown
rem
rem
rem
rem

col instance_name format a12 heading "Instance"
col current_blocks_served format 999,999,999 heading "Current Blks|Served"
col avg_current_ms format 99.99 heading "Avg|CU ms"
col cr_blocks_served format 999,999,999 heading "CR Blks|Served"
col avg_cr_ms format 99.99 heading "Avg|Cr ms"
set pages 1000
set lines 80
set echo on

WITH sysstats AS (
SELECT instance_name,
SUM(CASE WHEN name LIKE 'gc cr%time'
THEN VALUE END) cr_time,
SUM(CASE WHEN name LIKE 'gc current%time'
THEN VALUE END) current_time,
SUM(CASE WHEN name LIKE 'gc current blocks served'
THEN VALUE END) current_blocks_served,
SUM(CASE WHEN name LIKE 'gc cr blocks served'
THEN VALUE END) cr_blocks_served
FROM gv$sysstat JOIN gv$instance
USING (inst_id)
WHERE name IN
('gc cr block build time',
'gc cr block flush time',
'gc cr block send time',
'gc current block pin time',
'gc current block flush time',
'gc current block send time',
'gc cr blocks served',
'gc current blocks served')
GROUP BY instance_name)
SELECT instance_name , current_blocks_served,
ROUND(current_time10/current_blocks_served,2) avg_current_ms,
cr_blocks_served,
ROUND(cr_time
10/cr_blocks_served,2) avg_cr_ms
FROM sysstats;

說明:如果網絡是靈敏和快速的,但是LMS延遲較高,可能是以下原因

1.過載的實例不能快速響應全局緩存的請求,特別是lms進程可能是請求數太多,或者cpu不足。

  1. io瓶頸 特別是redo io,正在降低全局緩存請求的響應速度。

當集羣中一個或者多個出現超負載的情況時,高得全局緩存延遲可能發生。可能暗示需要關注集羣內的負載均衡。 ----可能配置了負載均衡,我們不要負載均衡模式。

高延時的其他常見原因是在發送塊給請求實例前,lms必須刷新未提交的變化到重做日誌。
這個腳本計算出需要重做日誌刷新的塊傳輸比例和執行刷新需要消耗的lms時間的比例:

set echo on
WITH sysstat AS (
SELECT SUM(CASE WHEN name LIKE '%time'
THEN VALUE END) total_time,
SUM(CASE WHEN name LIKE '%flush time'
THEN VALUE END) flush_time,
SUM(CASE WHEN name LIKE '%served'
THEN VALUE END) blocks_served
FROM gv$sysstat
WHERE name IN
('gc cr block build time',
'gc cr block flush time',
'gc cr block send time',
'gc current block pin time',
'gc current block flush time',
'gc current block send time',
'gc cr blocks served',
'gc current blocks served')),
cr_block_server as (
SELECT SUM(flushes) flushes, SUM(data_requests) data_requests
FROM gv$cr_block_server )
SELECT ROUND(flushes100/blocks_served,2) pct_blocks_flushed,
ROUND(flush_time
100/total_time,2) pct_lms_flush_time
FROM sysstat CROSS JOIN cr_block_server;

說明:小的塊刷新比例(1%)花費總lms時間較大的比例(36%),指示需要重新調優日誌io分佈。上圖指示的是19對應100,暫時不需要。

第五部分:集羣負載均衡

集羣中實例自啓動以來有關cpu,數據庫時間,和邏輯讀的統計數據。
腳本:balance.sql
rem **
rem
rem File: balance.sql
rem Description: Cluster balance report
rem
rem
rem
rem

col instance_name format a8 heading "Instance|Name"
col db_time_pct format 99.99 heading "Pct of|DB Time"
col cpu_time_pct format 99.99 heading "Pct of|CPU Time"
col db_time_secs format 9,999,999.99 heading "DB Time|(secs)"
col cpu_time_secs format 9,999,999.99 heading "CPU Time|(secs)"

set lines 80
set pages 1000
set echo on

WITH sys_time AS (
SELECT inst_id, SUM(CASE stat_name WHEN 'DB time'
THEN VALUE END) db_time,
SUM(CASE WHEN stat_name IN ('DB CPU', 'background cpu time')
THEN VALUE END) cpu_time
FROM gv$sys_time_model
GROUP BY inst_id )
SELECT instance_name,
ROUND(db_time/1000000,2) db_time_secs,
ROUND(db_time100/SUM(db_time) over(),2) db_time_pct,
ROUND(cpu_time/1000000,2) cpu_time_secs,
ROUND(cpu_time
100/SUM(cpu_time) over(),2) cpu_time_pct
FROM sys_time
JOIN gv$instance USING (inst_id);

說明:從上圖可以看到,節點1負載較高。如果要採用負載均衡,需要修改tnsname.ora

查詢服務負載展示各種各樣的工作負載統計數據
查詢通過對集羣各個節點的cpu消耗進行分解,展示各個實例消耗的cpu在總的cpu消耗的佔比,以及工作負載時如何在集羣的各個節點中進行分佈。

rem **
rem
rem File: service_stats.sql
rem Description: Report on service workload by instance
rem
rem
rem

col instance_name format a8 heading "Instance|Name"
col service_name format a15 heading "Service|Name"
col cpu_time format 99,999,999 heading "Cpu|secs"
col pct_instance format 999.99 heading "Pct Of|Instance"
col pct_service format 999.99 heading "Pct of|Service"
set lines 80
set pages 1000
set echo on

BREAK ON instance_name skip 1
COMPUTE SUM OF cpu_time ON instance_name

WITH service_cpu AS (SELECT instance_name, service_name,
round(SUM(VALUE)/1000000,2) cpu_time
FROM gv$service_stats
JOIN
gv$instance
USING (inst_id)
WHERE stat_name IN ('DB CPU', 'background cpu time')
GROUP BY instance_name, service_name )
SELECT instance_name, service_name, cpu_time,
ROUND(cpu_time * 100 / SUM(cpu_time)
OVER (PARTITION BY instance_name), 2) pct_instance,
ROUND( cpu_time

  • 100
    / SUM(cpu_time) OVER (PARTITION BY service_name), 2)
    pct_service
    FROM service_cpu
    WHERE cpu_time > 0
    ORDER BY instance_name, service_name;

說明,從這個可以很清楚看到每個服務佔用的CPU多少。

第六部分:度量全局緩存請求比例

以下腳本查詢執行計算並且決定物理讀和邏輯讀的比例,也就是緩存區告訴緩存命中率
rem **
rem
rem File: gc_miss_rate.sql
rem Description: "Global cache ""miss rate"" by instance "
rem
rem
rem
rem

col instance_name format a10 heading "Instance|name"
col logical_reads format 999,999,999 heading "Logical|Reads"
col gc_blocks_recieved format 999,999,999 heading "GC Blocks|Received"
col physical_reads format 999,999,999 heading "Physical|Reads"
col phys_to_logical_pct format 99.99 heading "Phys/Logical|Pct"
col gc_to_logical_pct format 99.99 heading "GC/Logical|Pct"
set pagesize 10000
set lines 80
set echo on
WITH sysstats AS (
SELECT inst_id,
SUM(CASE WHEN name LIKE 'gc%received'
THEN VALUE END) gc_blocks_received,
SUM(CASE WHEN name = 'session logical reads'
THEN VALUE END) logical_reads,
SUM(CASE WHEN name = 'physical reads'
THEN VALUE END) physical_reads
FROM gv$sysstat
GROUP BY inst_id)
SELECT instance_name, logical_reads, gc_blocks_received, physical_reads,
ROUND(physical_reads100/logical_reads,2) phys_to_logical_pct,
ROUND(gc_blocks_received
100/logical_reads,2) gc_to_logical_pct
FROM sysstats JOIN gv$instance
USING (inst_id);

分析:全局緩存/邏輯請求的最高比率的實例是最不繁忙的實例,實例越不忙,它需要的塊越有可能存在於其他更忙得實例中。

爲了判斷哪個段帶來最高比例的全局緩存行爲,查詢列舉接收到全局緩存塊數最多的段。

rem **
rem
rem File: top_gc_segments.sql
rem Description: Segments with the highest Global Cache activity
rem
rem
rem

col segment_name format a40
col gc_blocks_received format 999,999,999
col pct format 99.99
set pages 1000
set lines 80
set echo on

WITH segment_misses AS
(SELECT owner || '.' || object_name segment_name,
SUM(VALUE) gc_blocks_received,
ROUND( SUM(VALUE)* 100
/ SUM(SUM(VALUE)) OVER (), 2) pct
FROM gv$segment_statistics
WHERE statistic_name LIKE 'gc%received' AND VALUE > 0
GROUP BY owner || '.' || object_name)
SELECT segment_name,gc_blocks_received,pct
FROM segment_misses
WHERE pct > 1
ORDER BY pct DESC;

從以上可以看到查詢列舉接收到全局緩存塊數最多的段

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章