What is Bookmark Lookup?
If all the columns required by the query are not available in the nonclustered index itself, a lookup is required to base table to pick those columns which are not part of the nonclustered index being used to retrieve the data. Lookup fetches the corresponding data row from the table by following the row locator value from the index row. It requires extra logical read on data page excluding the logical read on the index page.
Types of lookup
This lookup may be a RID lookup against a clustered index or a key lookup against a heap.
If base table does not have any clustered index created on it, a Row ID of heap, will be used as a row locator and if any query which will require a lookup using this heap row id row locator will perform a lookup, the lookup will be called as a key lookup.
Create a table as below:
CREATE TABLE LookupExample ( EmpID INT, EmpName VARCHAR(100), EmpAddress VARCHAR(500), EmpContact VARCHAR(15), EmpPinCode VARCHAR(15) )
Fill demo data as below:
INSERT INTO LookupExample (EmpID, EmpName, EmpAddress, EmpContact, EmpPinCode) SELECT ROW_NUMBER() OVER(ORDER BY(SELECT 0)), 'Employee Name ' + CAST(column_id AS VARCHAR(10)), 'Demo address of employee : ' + name, LEFT(CAST(ABS(CHECKSUM(NEWID())) AS VARCHAR(10)) + '0000000000', 10), LEFT(CAST(ABS(CHECKSUM(NEWID())) AS VARCHAR(10)) + '000000', 6) FROM SYS.COLUMNS
Create a nonclustered index on this heap table (A table which does not have a clustered index is known as a heap table)
CREATE NONCLUSTERED INDEX noncidx_LookupExample_EmpID ON LookupExample(EmpID)
Now write a query to extract small data set from the table using this created nonclustered index as below and have a look on the execution plan:
(Only one row will be extracted from below query)
SELECT * FROM LookupExample WHERE EmpID = 1
And now have a look on this;
In above execution plan, we can see that the RID lookup is taking 50% of total cost of query execution cost to fetch the data using a RID lookup and then joining it to the main data extracted from index page. Because of this additional cost, we should avoid lookups from our plans.
The above RID lookup has taken place to extract below columns required in the final output of the query.
See the red circled Output list of the column from RID lookup. These columns are EmpName, EmpAddress, EmpContact and EmpPinCode.
If base table has a clustered index created on it, all the nonclustered index created on the same table will have clustered index key as a row locator and if any query which will require a lookup using this clustered index key row locator will perform a lookup, the lookup will be called as a key lookup.
Just create a clustered index on this table to get the key lookup instead of RID lookup as below;
CREATE CLUSTERED INDEX cidx_LookupExample_EmpName ON LookupExample(EmpName)
And now run the above query again and have a look on the execution plan.
SELECT * FROM LookupExample WHERE EmpID = 1
Now we are getting a key lookup instead of RID lookup. I did not change any thing except creating a clustered index on a column of the same table. We can see that there the cost is 50% here too in fetching data from data page using a key lookup and then joining the fetched data into the main result extracted from index page.
See the list of columns again to be fetched from key lookup;
Time taken by this key lookup may vary as per the size of clustered index key, if the key is wide in comparison of Row ID of heap, it will increase the logical read otherwise it will decrease it a bit. To see the exact logical read and difference at more granular level use below commands;
SET STATISTICS IO ON SET STATISTICS TIME ON
The logical read may vary as per the size of index in comparison of heap row id.
How can we remove these lookups?
To remove these lookups, we should keep in mind these things:
- Using a Clustered Index – To know when to use clustered index and when to use nonclustered index, Click Here
- Using a Covering Index
- Using an Index Join – To Know more Click Here
As per the performance point of view, we should have a clustered index created on each table to avoid this heap row id as a row locator for all nonclustered indexes created on the same table. And we should avoid both the lookups from our queries completely to avoid these extra logical reads on the base table. Please keep in mind that before going with any alternative to remove lookup, cross check the performance of the query with existing lookup plan.
Please share your comments below about this article.