接上篇:
3.分析跟蹤記錄
在跟蹤了一段時間之後,在文件中就會保存有跟蹤的數據(包括IO,Duration,CPU,Reads,Writes,RowCounts等計數器),接下來就是把跟蹤的數據加載到表並分析這些數據。可以選擇在Profile中打開並檢查這些跟蹤數據,會有些限制,如不能完成太多的操作,大量重複的SQL語句,沒有彙總。
3.1 加載數據到表(使用函數fn_trace_gettable返回表格形式的數據,作爲範例只選擇分析T-SQL代碼和Duration查詢的運行時間)
select CAST(textdata as nvarchar(max)) as tsql_code,duration into Workload from sys.fn_trace_gettable('C:\test\performancetrace_20100802.trc',NULL) as TT
3.2 彙總相同的SQL項
select tsql_code,SUM(duration) as total_duration from workload group by tsql_code
(由於我是在Production上面做的trace,考慮到系統的安全性,在此不便透露分析的SQL代碼,實在很抱歉,各位朋友如有興趣可在自己的測試環境中測試,討論測試的結果)
問題:分組聚合後會看到邏輯上相同(參數不同)的查詢會被分到不同的組,因爲在篩選器中使用了不同的值。因爲這些相同邏輯的SQL會使用相同的執行計劃,應該聚合在一起才能準備的分析總的查詢運行的時間。
3.3 問題處理方案一(大致分段截取)
通常情況下SQL語句都是Select+欄位,左邊有很大一部分是相同的,根據SQL字符的長度,截取前一段來聚合。如取前50,100,150. 方法簡單,容易操作,會聚合一部分數據,但是長度不太好取值,只能調整前綴的長度去測試。
select left(tsql_code,50) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,50) --or select left(tsql_code,100) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,100) --or select left(tsql_code,150) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,150)
3.4 問題處理方案二(複雜,精確,邏輯上相同的SQL,參數用通配符替代),這個方法是T-SQL查詢技術內幕中介紹的方法,如果需要更加詳細的說明,請閱讀這本書,你會得到更多的啓發。
(1) 模式化查詢,它對於相同模式的查詢是一樣的。
- T-SQL函數實現
建立函數:
CREATE FUNCTION [dbo].[fn_SQLSigTSQL] (@p1 NTEXT, @parselength INT = 4000) RETURNS NVARCHAR(4000) -- This function will replace the parameters with '#' -- This function is provided "AS IS" with no warranties, -- and confers no rights. -- Use of included script samples are subject to the terms specified at -- http://www.microsoft.com/info/cpyright.htm -- -- Strips query strings AS BEGIN DECLARE @pos AS INT; DECLARE @mode AS CHAR(10); DECLARE @maxlength AS INT; DECLARE @p2 AS NCHAR(4000); DECLARE @currchar AS CHAR(1), @nextchar AS CHAR(1); DECLARE @p2len AS INT; SET @maxlength = LEN(RTRIM(SUBSTRING(@p1,1,4000))); SET @maxlength = CASE WHEN @maxlength > @parselength THEN @parselength ELSE @maxlength END; SET @pos = 1; SET @p2 = ''; SET @p2len = 0; SET @currchar = ''; set @nextchar = ''; SET @mode = 'command'; WHILE (@pos <= @maxlength) BEGIN SET @currchar = SUBSTRING(@p1,@pos,1); SET @nextchar = SUBSTRING(@p1,@pos+1,1); IF @mode = 'command' BEGIN SET @p2 = LEFT(@p2,@p2len) + @currchar; SET @p2len = @p2len + 1 ; IF @currchar IN (',','(',' ','=','<','>','!') AND @nextchar BETWEEN '0' AND '9' BEGIN SET @mode = 'number'; SET @p2 = LEFT(@p2,@p2len) + '#'; SET @p2len = @p2len + 1; END IF @currchar = '''' BEGIN SET @mode = 'literal'; SET @p2 = LEFT(@p2,@p2len) + '#'''; SET @p2len = @p2len + 2; END END ELSE IF @mode = 'number' AND @nextchar IN (',',')',' ','=','<','>','!') SET @mode= 'command'; ELSE IF @mode = 'literal' AND @currchar = '''' SET @mode= 'command'; SET @pos = @pos + 1; END RETURN @p2; END
該函數參數爲一個查詢字符串和要分析的代碼的長度,但會輸入查詢的簽名,並用井號(#)替換所有的參數。測試結果如下:
select dbo.fn_SQLSigTSQL('select * from Sales.SalesOrderHeader where SalesOrderID=''43659'' and Status=''5'' ',500)
- CLR實現
CLR在處理迭代/過程邏輯和字符串處理時比T-SQL效率高,下面介紹用CLR實現模式化查詢。
a. 建立C#版的Classs Libary,函數如下:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.SqlServer.Server; using System.Data.SqlTypes; using System.Text.RegularExpressions;
public partial class SQLSignature { // fn_SQLSigCLR [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)] public static SqlString fn_SQLSigCLR(SqlString querystring) { return (SqlString)Regex.Replace( querystring.Value, @"([\s,(=<>!](?![^\]]+[\]]))(?:(?:(?:(?# expression coming )(?:([N])?(')(?:[^']|'')*('))(?# character )|(?:0x[\da-fA-F]*)(?# binary )|(?:[-+]?(?:(?:[\d]*\.[\d]*|[\d]+)(?# precise number )(?:[eE]?[\d]*)))(?# imprecise number )|(?:[~]?[-+]?(?:[\d]+))(?# integer ))(?:[\s]?[\+\-\*\/\%\&\|\^][\s]?)?)+(?# operators ))", @"$1$2$3#$4"); } // fn_RegexReplace - for generic use of RegEx-based replace [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)] public static SqlString fn_RegexReplace( SqlString input, SqlString pattern, SqlString replacement) { return (SqlString)Regex.Replace( input.Value, pattern.Value, replacement.Value); } }
b. 加載.dll中間語言代碼到DB
USE master; CREATE ASSEMBLY SQLSignature FROM 'C:\SQLSignature\SQLSignature\bin\Debug\SQLSignature.dll';
c. 註冊函數fn_SQLSigCLR和fn_RegexReplace
CREATE FUNCTION dbo.fn_SQLSigCLR(@querystring AS NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) WITH RETURNS NULL ON NULL INPUT EXTERNAL NAME SQLSignature.SQLSignature.fn_SQLSigCLR; GO CREATE FUNCTION dbo.fn_RegexReplace( @input AS NVARCHAR(MAX), @pattern AS NVARCHAR(MAX), @replacement AS NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) WITH RETURNS NULL ON NULL INPUT EXTERNAL NAME SQLSignature.SQLSignature.fn_RegexReplace; GO
d. 註冊完成之後,用下面代碼測試:
SELECT dbo.fn_SQLSigCLR(tsql_code) AS sig_sql, duration FROM dbo.Workload;
結果的SQL全被模式化,井號(#)替代所有的參數。
(2) 以用上面建立的函數,模式化追蹤的T-SQL語句,並分類彙總。
a. 以用查詢簽名,爲每個字符串生成整數的校驗和(CheckSum),方便以後的彙總計算,提高效率:
ALTER TABLE dbo.Workload ADD cs INT NOT NULL DEFAULT (0); GO UPDATE dbo.Workload SET cs = CHECKSUM(dbo.fn_SQLSigCLR(tsql_code)); CREATE CLUSTERED INDEX idx_cl_cs ON dbo.Workload(cs);
b. 用每個簽名的檢驗和計算運行時間填充臨時表#AggQueries,包括運行時間的百分比,以及運行時間降序的行號。
IF OBJECT_ID('tempdb..#AggQueries') IS NOT NULL DROP TABLE #AggQueries; GO SELECT cs, SUM(duration) AS total_duration, 100. * SUM(duration) / SUM(SUM(duration)) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY SUM(duration) DESC) AS rn INTO #AggQueries FROM dbo.Workload GROUP BY cs; CREATE CLUSTERED INDEX idx_cl_cs ON #AggQueries(cs);
查詢聚合之後臨時表的內容,數據量會大大的減少,包含簽名,總的運行時間,運行時間佔總運行時間的半分比,排序序號。
c.篩選並匹配,使用APPLY運算符得到查詢模式和一個示例查詢。
WITH RunningTotals AS ( SELECT AQ1.cs, CAST(AQ1.total_duration / 1000. AS DECIMAL(12, 2)) AS total_s, CAST(SUM(AQ2.total_duration) / 1000. AS DECIMAL(12, 2)) AS running_total_s, CAST(AQ1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(AQ2.pct) AS DECIMAL(12, 2)) AS run_pct, AQ1.rn FROM #AggQueries AS AQ1 JOIN #AggQueries AS AQ2 ON AQ2.rn <= AQ1.rn GROUP BY AQ1.cs, AQ1.total_duration, AQ1.pct, AQ1.rn HAVING SUM(AQ2.pct) - AQ1.pct <= 90 -- percentage threshold ) SELECT RT.rn, RT.pct, S.sig, S.tsql_code AS sample_query FROM RunningTotals AS RT CROSS APPLY (SELECT TOP(1) tsql_code, dbo.fn_SQLSigCLR(tsql_code) AS sig FROM dbo.Workload AS W WHERE W.cs = RT.cs) AS S ORDER BY RT.rn;
4. 有了查詢模式,示例查詢,和佔用時間的百分比例和排序。然後就可以着手優化。也可以通過類似的方式,找到造成大量結果集,大多數的I/O問題的查詢模式。
四:總結
Perfiler是一個很好用的工具來追蹤系統的性能和工作的負荷,從而準確的找到值得優化的SQL,提高效率,大大減少工作量。
附件下載:Server性能計數器.rar
參考:Microsoft SQL Server 2005技術內幕:T-SQL查詢