What is Apply in SQL Server?
In this topic “Apply vs Join”, we will discuss about the APPLY operator and its usage scenarios. Lot of people got confused “where they can use Apply and what is it”. They afraid about the performance and usage scenario also. In this article we are going to remove all these scary images of Apply from your mind.
So let’s start with ABCD of Apply and try to cover some advanced topics.
The primary use of Apply operator in sql server is to invoke a table valued function (TVF) but it can also be used in some other important scenarios. We will discuss the importance of Apply operator in brief in this article and would also compare this with Join operator. Invoking table valued function (TVF) is for what Microsoft introduced Cross Apply and Outer Apply primarily, as per MSDN.
Apply Operator executes for each row of left input and finds the matches in right input and then returns the result set as per it’s type; Cross or Outer Apply.
The output of the Apply operator is all the columns from Left input followed by all the columns from right input for matching rows.
Apply operator has two types:
Similar to Inner join it returns the row from both the input (Left and right) if WHERE condition satisfied.
Similar to Left Join it returns all the rows from left input and matching rows from right input if WHERE condition matches. If no match found in right input, NULL values will be returned for all the columns.
Note : The database compatibility level to use this operator must be 90 at least.
When to choose “Apply” and when to “Join”?
Join operator in sql server is one of the most frequent operator we use. Using this operator we start making relationships with other tables based on some conditions in RDBMS systems. Have a look on this link to know more about types and definitions. Click Here.
Here our motto is to find some situation where Apply is best fit rather than a Join and some situations where Join is better than Apply.
When to choose “Apply”
Scenario 1(To execute a Table valued function)
Microsoft primarily thrown this operator to be used to join with table valued functions. Have a look on the below query and error too;
SELECT * FROM SYS.dm_exec_cached_plans CP INNER JOIN SYS.dm_exec_sql_text(CP.plan_handle) CA
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near ‘CA’.
SYS.dm_exec_sql_text is a table valued system DMF (Dynamic management function) and as i discussed at very beginning of this article that, to handle a row by row join condition, Microsoft introduced Apply operator.
In above query, i am trying to put a JOIN operator between two result sets which are dependent (right input is dependent on left input’s plan_handle column value). A join can only be performed between two independent SETS and so now this is a condition where we can not perform the JOIN.
Just try this;
SELECT * FROM SYS.dm_exec_cached_plans CP CROSS APPLY SYS.dm_exec_sql_text(CP.plan_handle) CA
And you will get the result set, query runs successfully.
Now again have a look on this;
SELECT * FROM SYS.INDEXES SI INNER JOIN SYS.DM_DB_INDEX_PHYSICAL_STATS(DB_ID('ADVENTUREWORKS2012'), OBJECT_ID('SALES.SALESORDERDETAIL'), NULL, NULL, 'DETAILED') SPS ON SPS.index_id = SI.index_id AND SI.OBJECT_ID = OBJECT_ID('SALES.SALESORDERDETAIL')
In this example, we are again performing a join with a TVF and this time query runs successfully. So can you guess what could be the reason behind that?
Here is the reason
In first query the right input in for JOIN operator was dependent on a column value of left input whereas in this query all the values are as constant and thus, both the inputs are independent and JOIN can be performed.
Scenario 2 (To join with only Top N records)
Sometimes we need to have a JOIN with only Top N records from another table with ORDER BY and may be with some conditions too. Have a look on this demo;
Create demo Tables:
CREATE TABLE Employee ( ID INT IDENTITY(1, 1), Name VARCHAR(100) ) GO CREATE TABLE LoginEntry ( ID INT IDENTITY(1, 1), LoginTime DATETIME, EmpID INT, GateNumber VARCHAR(50) ) GO --Insert demo data as below; INSERT INTO Employee(Name) VALUES('Employee 1'), ('Employee 2'), ('Employee 3'), ('Employee 4'), ('Employee 5'), ('Employee 6') GO INSERT INTO LoginEntry(LoginTime, EmpID, GateNumber) VALUES('20141024 08:00', 1, 'Gate 1'), ('20141024 09:00', 1, 'Gate 1'), ('20141024 10:00', 1, 'Gate 2'), ('20141024 08:00', 2, 'Gate 1'), ('20141024 09:00', 2, 'Gate 1'), ('20141024 10:00', 2, 'Gate 2')
Have a look on SELECT output;
SELECT * FROM Employee
SELECT * FROM LoginEntry
Now from above table, i want to get Employee ID, Employee Name with first log-in time and as you can see, in LoginEntry table,we have multiple entries for each employee. To get the desired output, we can use an Apply (Cross or Outer), or also a subquery too. But in case of a subquery, we can extract only one column at a time. That is in each subquery, we can extract only one column. So for N number of columns we have to use N number of subqueries.
SELECT Employee.ID, Employee.Name, OA.LoginTime FROM Employee CROSS APPLY (SELECT TOP 1 LoginEntry.ID, LoginEntry.LoginTime FROM LoginEntry WHERE LoginEntry.EmpID = Employee.ID ORDER BY LoginTime ASC) OA
Scenario 3 (To UNPIVOT)
And this condition LoginEntry.LoginTime <> ISNULL(OA1.LoginTime, ”) will avoid the repetition of the same login time in both columns. So we can also access the Outer apply columns of previous APPLY in another apply coming to next.
Here we are doing like UNPIVOTing the records and also comparing with previous column. Apply is a best fit in such types of situations.
Scenario 4 (Instead of subquery for multiple columns)
When to choose JOIN
If both the inputs are independent and / or there is no need to have a join criteria with only TOP N records, JOIN should be preferred in such types of scenarios. Except above mentioned scenarios JOIN can be a good choice instead of APPLY for it’s simplicity of syntax and to match all rows at once instead of invoking right input every time for each row of left input , as APPLY invokes.
Performance of APPLY vs JOIN
In different scenarios, performance of JOIN and APPLY are different. In some scenarios like above (Scenario 1, 2, 3 and 4), APPLY performs better than JOIN. For example, if we have to join with TOP N records, it’s too easy to implement with the help of APPLY and will perform much better than JOIN. Implementing the same with JOIN is a tedious task.
And in case, both the inputs are independent JOIN can perform better than APPLY, as APPLY invokes right input every time for each row of left input.
In some scenarios, both JOIN and APPLY performs similar and produce the same execution plan also. You have to check this with your query in your environment before replacing JOIN and APPLY with each other.
As we have discussed that Microsoft primarily thrown this function to be used with TVF joins but except this, it also offers some out of box functionality, some of them we have discussed above. There could be some more usage of this operator also. Above we have mentioned some of important usage of this operator.
Don’t forget to put your comments and suggestions below in comment box. Thanks to you for your patients to read this post.