Performance Monitor - setting up to diagnose issues

Performance Monitor - setting up monitoring and configuring

We can use the windows performance monitor to help diagnose sql server performance issues.  This guide is based on windows server 2012 and sql server 2014 but should be applicable from windows server 2008 onwards.
Run perfmon.exe (or performance monitor from adminitrative tools)
Right click on "Data Collector Sets" - "User Defined"
Left click on "new" - "data collector set"
Name the data collector set and select "create manually (Advanced)" then "Next"
Select "Create Data Logs"  and tick "Performance Counter" then "Next"
Which Performance Counters would you like to log? screen appears. Click "add"
Then select performance counters to be monitored.  I have included a list of useful counters with information to help on debugging and diagnosis.
Useful Performance Counters to Add
Once you have added all the counters you wish to monitor hit finish and your named data collection set should appear under Data Collector Sets - User Defined.  We have now set up the information we wish to retrieve but not yet set up any rules on how to schedule the data collection.
Right click on your data collector set and left click properties.  On the schedule tab you can add a schedule for the collection report to be run.  On the "Stop Condition" tab you can set how long you want the collection to be running for.  Please remember that this will use up disk space and could effect performance to please set a sensible time for the report to run and do not leave it running indefinitely.
To manually run the collection right click on your data collector set and left click on "start".
A report should then appear under "Reports" - User Defined - <data collector name> which you can click on to open and view the stats.
Notes
I've tried setting up templates and reloading them onto another server to save time. Unfortunately the server/instance name is coded into the template so they are not easily transferable between instances and will have to be set up manually on each server or the xml template file would need to be edited with the correct instance/server names.
MSSQL$<instance>:Access Methods Full Scans / sec (Index Searches/sec)/(Full Scans/sec) > 1000 This counter monitors the number of full scans on base tables or indexes. Values greater than 1 or 2 indicate that we are having table / Index page scans. If we see high CPU then we need to investigate this counter, otherwise if the full scans are on small tables we can ignore this counter.  A few of the main causes of high Full Scans/sec are
• Missing indexes
• Too many rows requested
Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.
MSSQL$<instance>:Access Methods Index Searches / sec (Index Searches/sec)/(Full Scans/sec) > 1000 Number of index searches. Index searches are used to start range scans, single index record fetches, and to reposition within an index. Index searches are preferable to index and table scans.  For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.
MSSQL$<instance>:Access Methods Page Splits / sec < 20 per 100 Batch Requests/Sec Number of page splits per second that occur as the result of overflowing index pages. Interesting counter that can lead us to our table / index design. This value needs to be low as possible. If you find out that the number of page splits is high, consider increasing the fillfactor of your indexes. An increased fillfactor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.
Note that this counter also includes the new page allocations as well and doesn’t necessarily pose a problem.  The other place we can confirm the page splits that involve data or index rows moves are the fragmented indexes on page splits.
MSSQL$<instance>:Buffer Manager Buffer Cache hit ratio > 90% This counter indicates how often SQL Server goes to the buffer, not the hard disk, to get data. The higher this ratio, the less often SQL Server has to go to the hard disk to fetch data, and performance overall is boosted. Unlike many of the other counters available for monitoring SQL Server, this counter averages the Buffer Cache Hit Ratio from the time the last instance of SQL Server was restarted. In other words, this counter is not a real-time measurement, but an average of all the days since SQL Server was last restarted. In OLTP applications, this ratio should exceed 90-95%. If it doesn't, then you need to add more RAM to your server to increase performance. In OLAP applications, the ratio could be much less because of the nature of how OLAP works. In any case, more RAM should increase the performance of SQL Server OLAP activity.
MSSQL$<instance>:Buffer Manager Page Life Expectancy > 300 This performance monitor counter tells you, on average, how long data pages are staying in the buffer. If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.
MSSQL$<instance>:Databases Transactions/sec
MSSQL$<instance>:General Statistics Logins/sec < 2 > 2 per second indicates that the application is not correctly using  connection pooling
MSSQL$<instance>:General Statistics Logouts/sec < 2 > 2 per second indicates that the application is not correctly using connection pooling
MSSQL$<instance>:General Statistics User Connections See Description The number of users currently connected to the SQL Server.

Note: It is recommended to review this counter along with “Batch Requests/Sec”.   A surge in “user connections” may result in a surge of “Batch Requests/Sec”.  So if there is a disparity (one going up and the other staying flat or going down), then that may be a cause for concern. With a blocking problem, for example, you might see user connections, lock waits and lock wait time all increase while batch requests/sec decreases. 
MSSQL$<instance>:Latches Latches Waits/sec (Total Latch Wait Time) / (Latch Waits/Sec) < 10 This is the number of latch requests that could not be granted immediately. In other words, these are the amount of latches, in a one second period that had to wait.
MSSQL$<instance>:Latches Total Latch Wait Time (ms) (Total Latch Wait Time) / (Latch Waits/Sec) < 10 This is the total latch wait time (in milliseconds) for latch requests in the last second
MSSQL$<instance>:Memory Manager Total Server Memory (KB) See Description The Total Server Memory is the current amount of memory that SQL Server is using.  If this counter is still growing the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory. 
MSSQL$<instance>:Locks Lock Waits/sec
0
This counter reports how many times users waited to acquire a lock over the past second.  Note that while you are actually waiting on the lock that this is not reflected in this counter---it gets incremented only when you “wake up” after waiting on the lock. If this value is nonzero then it is an indication that there is at least some level of blocking occurring.  If you combine this with the Lock Wait Time counter, you can get some idea of how long the blocking lasted.  A zero value for this counter can definitively prove out blocking as a potential cause; a nonzero value will require looking at other information to determine whether it is significant. 
MSSQL$<instance>:SQL Statistics Batch Requests/sec See Description This counter measures the number of batch requests that SQL Server receives per second, and generally follows in step to how busy your server's CPUs are. Generally speaking, over 1000 batch requests per second indicates a very busy SQL Server, and could mean that if you are not already experiencing a CPU bottleneck, that you may very well soon. Of course, this is a relative number, and the bigger your hardware, the more batch requests per second SQL Server can handle. From a network bottleneck approach, a typical 100Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need to have two or more network cards, or go to a 1Gbs network card.

Note: Sometimes low batch requests/sec can be misleading.  If there were a SQL statements/sec counter, this would be a more accurate measure of the amount of SQL Server activity.  For example, an application may call only a few stored procedures yet each stored procedure does lot of work.  In that case, we will see a low number for batch requests/sec but each stored procedure (one batch) will execute many SQL statements that drive CPU and other resources.  As a result, many counter thresholds based on the number of batch requests/sec will seem to identify issues because the batch requests on such a server are unusually low for the level of activity on the server.  

We cannot conclude that a SQL Server is not active simply by looking at only batch requests/sec.  Rather, you have to do more investigation before deciding there is no load on the server.  If the average number of batch requests/sec is below 5 and other counters (such as SQL Server processor utilization) confirm the absence of significant activity, then there is not enough of a load to make any recommendations or identify issues regarding scalability.
MSSQL$<instance>:SQL Statistics SQL Compilations/sec < 10% of the number of Batch Requests/Sec The number of times per second that SQL Server compilations have occurred. This value needs to be as low as possible. If you see a high value such as over 100, then it’s an indication that there are lots of adhoc queries that are running, might cause CPU usage, solution is to re-write these adhoc as stored procedure or use sp_executeSQL.
MSSQL$<instance>:SQL Statistics SQL Re-compilations/sec < 10% of the number of SQL Compilations/sec This needs to be nil in our system as much as possible. A recompile can cause deadlocks and compile locks that are not compatible with any locking type.

Comments

Popular posts from this blog

SQL SERVER – Event ID 107- Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database.

SQL Server Builds Information

Using DBCA silent install and disabling automatic memory management