In this article, we are going to discuss the SQL Server’s query execution flow architecture. When a query gets submitted to SQL Server, it goes through multiple steps before final output. We will discuss all these steps of “Query Execution Flow” starting from query submission to getting output.
As we all know that SQL Server has various components to perform various tasks on the given query. Each component performs some operation and pushes its output to the next component in the pipeline.
Lets understand the query execution flow architecture of DML statements. DDL and DCL statements do not follow optimization steps and gets submitted to storage engine directly after parsing step.
SQL Server Engine Architecture:
Below are the major components of SQL Server;
- Relation Engine
- Query Optimizer
- Storage Engine
Just have a look on the below image and then read the below details carefully;
As we can see in above image, Relation engine has parser, alzebrizer, and query optimizer as its component. Each component has its own role and responsibility. Below are the details:
It is a combination of parser, alzebrizer and query optimizer.
It parses the submitted t-sql code and checks for the syntax. If submitted t-sql is correct, it generates the parse tree and submit its to alzebrizer if given t-sql is a DML statement. In case of DDL and DCL, it directly sends to storage engine.
Input for alzebrizer is a parse tree, generated from parser and it also responsible for binding. It resolves all the names and determines aggregates. It generates alzebrized tree or binary tree and submits it to query optimizer.
Query optimizer is responsible for creating a “Good enough plan” instead of “maximum performance plan” to execute the given t-sql code. SQL Server optimizer is a “cost based optimizer“. It generates multiple plan (if no plan is found in cache) for the given t-sql and choose the lowest cost plan from the generated plans. There are some thresholds to recognize the limits of number of plans generated and type of plans like with single processor or using parallelism(multiple processors).
It uses the query processor tree and statistics it has about the data, and applying the model, it works out what it thinks will be the optimal way to execute the query.
It decided indexes to be used, types of joins to be performed and much more. It decides the plan based on the cost and the cost can be defined as a combination of CPU processing time and I/O cost.
If your query is very simple, it will generate a trivial plan eg a single table with no index. In such type of cases instead of spending time to generate an optimal plan, it simply generates a trivial plan. But if the query is non-trivial, optimizer performs cost-based calculation to select a plan.
And once the optimizer arrives at an execution plan, this plan is stored in memory known as plan cache. So that when next time the same t-sql batch will execute, this plan can be reused.
Once the plan execution plan is selected, query optimizer transfers the control to “Storage Engine”.
Here we get the “estimated execution plan“.
Up to this component all are sub components of relation engine.
Storage engine is responsible for actual execution of the query. During actual execution in certain cases the execution plan can be changed slightly. eg. in case of deferred plans (DML + DDL). Here we get “actual execution plan“. In case the plan is being changed during actual execution, we can get some slight differences in actual and estimated plans. The output of the executed query is returned to the caller.
The same can be displayed as a flow chart as below;
In this article, i have tried to cover the different components and their responsibilities during tsql query execution and also attached images to make it easy to understand for you.
Hope you have enjoyed this article. If you like this article, please share your comments here and also share this article on your social media channels.