SQL Server Performance Scripts
SQL Server performance scripts
1. Signal Waits PercentageThis metric has provided the most value to me over the years. I started using it over six years ago when troubleshooting CPU issues. The instance
itself was showing signs of “slowness”, but the overall CPU utilization numbers were nominal. I came across this article by Tom Davidson and
found out that I could calculate the percentage of signal waits to the overall waits in order to understand internal CPU pressure exists. [A
"signal" wait means that a session is in the "runnable" queue, waiting for a signal from an available processor.]
I filter out a handful background tasks for this query as well. The value I like to see returned here is for no more than 20-25% (see above for
my comments about the numbers I have chosen) of the total waits to be signal waits. Numbers greater than 20% can indicate internal CPU pressure.
I can remedy the situation by reducing the number of sessions (not always likely), increasing the number of available CPUs (also not likely), or
reducing the amount of time the queries need to execute (often very likely, and sometimes easily done).
SELECT (100.0 * SUM(signal_wait_time_ms)/SUM (wait_time_ms)) AS [SignalWaitPct]
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT',
'BROKER_TO_FLUSH', 'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT',
'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE',
'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES',
'SLEEP_BPOOL_FLUSH', 'SP_SERVER_DIAGNOSTICS_SLEEP', 'HADR_FILESTREAM_IOMGR_IOCOMPLETION')
AND wait_time_ms <> 0
=================================================================================================================================================
2. SQL Compilation Percentage
Another sneaky performance issue is having a high number of statements being compiled. There happens to be a performance counter for compiles but
the counter itself does not have much meaning unless you also know the overall number of batch requests. Rather than look at either of those
counters separately what I like to do is look at their ratio. In fact, you’ll find a handful of items here are measured as a percentage of batch
requests.
The percentage of compilations I like to see is roughly 10% of the total number of batch requests (see above for my comments about the numbers I
have chosen). The following code will show you the current percentage of compilations to the number of batch requests:
SELECT 1.0*cntr_value /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Batch Requests/sec')
AS [SQLCompilationPct]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'SQL Compilations/sec'
At this point someone usually asks “what about re-compilations”? Well, re-compilations should only be roughly 1% (see above for my comments about
the numbers I have chosen) of the total number of batch requests, so that code would be like this:
SELECT 1.0*cntr_value /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Batch Requests/sec')
AS [SQLReCompilationPct]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'SQL Re-Compilations/sec'
=================================================================================================================================================
3. Page Lookups Percentage
What I like most about this metric is that it often gives me a hint regarding plan cache bloating without me having to dig through the plan cache
itself. It’s possible I have a handful of queries that have lots of logical I/O (not necessarily a problem, mind you), or I may have lots of
little queries (probably ad-hoc) which could lead to plan cache bloat.
This metric also gives me insight on the queries hitting the instance, to see if they could use some tuning/indexing as there are likely some
inefficient query plans being utilized.
What I want to see here is a value that is less than 100 (see above for my comments about the numbers I have chosen) on average, but it really
depends upon the nature of your instance:
SELECT 1.0*cntr_value /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Batch Requests/sec')
AS [PageLookupPct]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page lookups/sec'
=================================================================================================================================================
4. Page Splits Percentage
By themselves page splits are not necessarily bad. But a lot of them happening at once could indicate trouble. Monitoring for just the number of
page splits by itself isn’t very reliable, as the counter includes any new page allocations as well as page splits due to fragmentation. So I
like to compare the number of page splits to the number of batch requests. The number I look for here is roughly 20 page splits/sec for every 100
batch requests (see above for my comments about the numbers I have chosen).
This code gives me the page splits percentage:
SELECT 1.0*cntr_value /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Batch Requests/sec')
AS [PageSplitPct]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page splits/sec'
=================================================================================================================================================
5. Average Task Counts
I got this metric from Glenn Berry (blog | @GlennAlanBerry), and I use this to determine what my system has been busy doing. I like how Glenn
always has a brief description for what you want (or don’t want) to see. In this case it is as follows:
High Avg Task Counts (>10) are often caused by blocking or other resource contention
High Avg Runnable Task Counts (>1) are a good sign of CPU pressure
High Avg Pending DiskIO Counts (>1) are a sign of disk pressure
I can use this one in conjunction with others to get some corroborating evidence as to what is happening under the covers:
SELECT AVG(current_tasks_count) AS [Avg Task Count],
AVG(runnable_tasks_count) AS [Avg Runnable Task Count],
AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers WITH (NOLOCK)
WHERE scheduler_id < 255 OPTION (RECOMPILE)
=================================================================================================================================================
6. Buffer Pool I/O Rate
The most misunderstood counter I see is Page Life Expectancy (PLE). I know I used to pay attention as if it meant something. Over time I have
come to realize that it is a throwback to days of yore when having 4GB of memory was a really big deal. In other words, its intended purpose no
longer has the same value as it once did. It is not totally worthless, however. You just need to know what to do with it.
What I want to see is the rate at which data pages are being cycled through the buffer pool. The following code returns the “Buffer Pool Rate” in
MB/sec. I usually look for rates around 20MB/sec as a baseline. Why that number? Because if I have a “typical” server with 56GB of RAM available
for the buffer pool and I want to keep my pages around for an hour or so (3600 seconds), then I come up with 56000MB/3600sec, or about 15.5
MB/sec. You can do your own math here to arrive at your own recommended throughput, but that’s why I look for a sustained rate of about 20 on
average. If I have a spike upward from there then I know I am having memory pressure (pressure that might otherwise fail to be seen if I only
examine the PLE counter).
Here is the code I use:
SELECT (1.0*cntr_value/128) /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name like '%Buffer Manager%'
AND lower(counter_name) = 'Page life expectancy')
AS [BufferPoolRate]
FROM sys.dm_os_performance_counters
WHERE object_name like '%Buffer Manager%'
AND counter_name = 'total pages'
You should also be aware that if your server is NUMA aware then you will want to take that into consideration whenever you try to use PLE as a
performance metric.
=================================================================================================================================================
7. Memory grants
This counter helps me understand if I am seeing internal memory pressure. Ideally this value should be as close to 0 as possible. Sustained
periods of non-zero values are worth investigating:
SELECT cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Memory Grants Pending'
If my queries are waiting for memory grants I would look at the query plans first before adding more RAM as a knee-jerk reaction. It could be the
case that the new Cardinality Estimator in SQL 2014 has over-estimated the amount of memory needed for this particular query. You may want to try
running the query using the trace flag option 9481 and see if the legacy CE works better.
Comments