In this article, i will introduce the internal join techniques, which sql server uses to perform various joins internally also known as Nested Loop, Merge and Hash Joins. These types are not directly exposed to us but we can use them as a query hint. These are also known as physical joins in SQL Server. So lets explore this topic together “Nested Loop, Merge and Hash Joins in SQL Server”.
In our queries, simply we write as below;
From AdventureWorks2012 Database:
USE AdventureWorks2012 GO SELECT e.[BusinessEntityID] ,p.[Title] ,p.[FirstName] ,p.[MiddleName] ,p.[LastName] ,p.[Suffix] ,e.[JobTitle] FROM [HumanResources].[Employee] e INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID]
Now its a job of sql server to create an appropriate plan for the query, execute it and return the result set to the caller. SQL Server has multiple components to perform this series of tasks including query parsing, creating query tree, creating binary plan and after all executing the query and returning the result set.
The actual execution of the query is handled by storage engine component of the SQL Server.
SQL server uses cost based optimization technique for query plan creation. And it selects the most efficient “Good enough plan” for the given query.
To execute the above query SQL can select any one from Nested Loop, merge and hash joins, based on some criteria like.;
- Size of table
- Order of index used in join
- Indexes available
- Type of index – Clustered or nonclustered
SQL Server chooses Loop join when one input set is small and other one is fairly large and indexed on its join column, nested loop join is the fastest join operation because they require least I/O and the fewest comparison. It’s also called as nested iteration.
It is particulary effective if the outer input is small and the inner input is preindexed and large. For many small transactions with small data set nested loop type of join is superior than merge and hash joins.
How it works?
The nested loops join, uses one join input as the outer input table and one as the inner input table. The outer loop consumes the outer input table row by row. The inner loop, executed for each outer row, searches for matching rows in the inner input table.
If there is an index being used in the query this join is called as index nested loop join and if there is no index to be used in the query this join is called as naive nested loop join. And if index is built as the part of the query plan and destroyed upon completion of the query; it is called as temporary index nested loop join.
In the above picture; type of join which is Nested loop is pointed with a red circle, outer input table is with blue circle and inner input table is with green circle.
If two inputs are not small but sorted on the join columns; a merger join is the fastest operation. It requires both tables to be sorted on the merge column.
How it works:-
As we know, both inputs should be sorted, this operator gets a row from each input and compares them. The rows are returned if they are equal (inner join). If they are not equal, the lower-value row is discarded and another row is obtained from that input. This process repeats until all rows have been processed by the engine.
The hash join can efficiently handle large, unsorted and nonindexed inputs.
How it works:
The hash join has two inputs: the build input and probe input with smaller input as build input. This type of join can be used for many types of set based operation. Example; inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference.
Types of hash joins:
It has below types;
In-Memory Hash Join
This type of hash join first scans or computes the entire build input (small input) and then builds a hash table in memory. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. This build phase is followed by the probe phase. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key’s value is computed, the corresponding hash bucket is scanned, and the matches are produced.
Grace Hash Join
In this type if the build input does not fit in memory, a hash join proceeds in several steps. This is known as a grace hash join. Each step has a build phase and probe phase. Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. The hash join is then applied to each pair of partitioned files.
Recursive Hash Join
In this type if the build input is so large that inputs for a standard external merge would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.
The term hash bailout is sometimes used to describe grace hash joins or recursive hash joins.
So, these are the internal joins known as physical joins used by SQL Server to perform any join.
Thanks for your reading and please share this post and rate it.