Datawarehouse Concepts: Understanding SQL Server Execution Plan

The aim of this chapter is to help you understand the SQL Server Execution plan and how to use them to optimize the queries.

Let's first start with What's an Execution Plan is:

An execution plan, simply put, is the result of the query optimizer's attempt to calculate the most efficient way to implement the request represented by the T-SQL query you submitted.

Execution plans can tell you how a query will be executed, or how a query was executed. They are, therefore, the DBA's primary means of troubleshooting a poorly performing query. Rather than guess at why a given query is performing thousands of scans, putting your I/O through the roof, you can use the execution plan to identify the exact piece of SQL code that is causing the problem. For example, it may be scanning an entire table-worth of data when, with the proper index, it could simply backpack out only the rows you need. All this and more is displayed in the execution plan.

Guidelines for reading a SQL Server query execution plan

· The graphical query execution plan has a shape of a tree. The tree structure depends on the statements in the query

Each query in a batch is shown in the plan
Each icon presents an operator
Icons are read from left to right, top to bottom
The tree shows parent and children-level nodes
Every child node is connected to a parent node by arrows
The cost of each query is presented as a percentage of the total batch cost. The cost is the time needed to execute a statement/query/batch. Total cost of every batch, i.e. the sum of individual query costs should be 100%. However, it is possible that even in the actual query execution plans for costly queries, these costs are miscalculated and the sum is over 100%

Children of the same parents are placed in the same column. This doesn’t mean that all operators in the same column have the same parent

The arrow width depends on the actual/estimated number of rows. The arrows that represent a large number of rows are thicker

·If a query has multiple queries, query execution plans are drawn one under another. The relative cost for each query is presented as a percentage of total cost of all statements. The sum of all query costs is 100%

Operators show how queries and statements are executed. They can be physical and logical. Logical operators present an operation on a conceptual level, while physical operators implement the operation defined by the logical operator using a specific method. In other words, a physical operator is an object or routine that executes an operation. Physical operators access columns and rows in tables and views, calculate expressions, create joins, etc.

Optimizing query by reading plan

Let's take the below query as an example:

DECLARE @LocationName AS NVARCHAR(50);

SET @LocationName = 'Paint';

SELECT p.Name AS ProductName ,

pi.Shelf ,

l.Name AS LocationName

FROM Production.Product AS p

JOIN Production.ProductInventory AS pi ON pi.ProductID = p.ProductID

JOIN Production.Location AS l ON l.LocationID = pi.LocationID

WHERE LTRIM(RTRIM(l.Name)) = @LocationName;

When this query is executed, you'll see the execution plan as below:

There are several issues that needs to be addressed in this plan:

1. The Index Scan on the Location.Name column – there's an index on that column it should have used the seek

2. The fat pipe coming out of the Clustered Index Scan on the ProductInventory table – in this example, it's processing over 1000 rows at this stage, and eventually only returning 9 rows.

3. The Hash Match join – for the number of rows returned where we are expecting a Nested Loops join.

4. The Clustered Index Seek on the Product table looks very expensive relative to the other operators.

When we look into the code, it becomes clear that its not using the SEEK because of the function used on the Location.Name column, LTRIM(RTRIM(l.Name)). We can then remove that function and rerun the query and the plan would change to:

Now that the Location rows are accesses with a seek instead of a scan, SQL Server has a better idea how many rows to expect and so switches to a more appropriate join type. This is why it's pretty common to see Hash Matches where there are bad or missing indexes.

To remove the fat pipe connected to the Clustered Index Scan on ProductInventory, we can create a new index on ProductInventory table

CREATE INDEX ProductionLocation ON Production.ProductInventory(LocationID)

INCLUDE (Shelf);

New plan look like below

The only thing remaining is to look into the high-cost Clustered Index Seek. However, once we understand the nature of the Nested Loops join, it becomes apparent that this is not actually a problem in this case. During a Nested Loops operation, for each row returned by the inner data set, it compares it to the outer data set, looking for matches. In this case, for each row returned fromLocation, SQL Server performs a seek on the Product table. That explains why the cost is higher.

Operator tooltips

Besides the information presented by the icons and nodes, SQL Server query execution plans provide much more details in the icon tooltips. To see the tooltip:

· Place the mouse over the icon in the plan, or

· Right-click the icon and in the context menu select Properties

The information shown in tooltips is various and depends on the operator type. The items shown in the tooltips are different for actual and estimated plans. The tooltip on the left is for the actual and on the right for the estimated plan for the same operator

The estimated and actual values for the same parameters may not be equal in some scenarios

The information available in a tooltip is:

Physical Operation – the object or routine executed, the operator name presented in the plan

Estimated Operator Cost – the cost of the operation, presented as a percentage of total batch cost

Estimated I/O Cost, Estimated CPU Cost – Query Analyzer creates several execution plans, calculates the cost of each (how much I/O and processor resources are needed to complete the operation), and determines the most efficient one

Estimated Row Size – the number of rows that is believed to be affected by the operation, useful for determining how much memory is needed

Actual and Estimate Number of Rows – even if there is significant discrepancy between these two values, it doesn’t indicate a problem. It’s usually caused by out of date statistics

Actual and Estimate Execution mode – row or batch, shows whether the rows are processed one at a time, or in batches

Datawarehouse Concepts

Thursday, April 28, 2016

Understanding SQL Server Execution Plan

No comments:

Post a Comment