In this topic we are going to compare clustered indexes with nonclustered indexes with their usage and performance and will also discuss some points regarding the topic “when to choose which one”. Clustered Vs NonClustered indexes is an old topic and you can get many article written so far. But in this post I will cover the definitions, structure of indexes, syntax of clustered and nonclustered indexes and the most interesting part is “when to choose which one”. So lets explore clustered and nonclustered indexes with “When to use Clustered vs NonClustered indexes” in deep.
As we know that a Clustered indexes sorts and stores the data rows in the table or view based on their key values. These are the columns included in the index definition. And the most important thing is we can have only one clustered index per table, because the data rows themselves can be sorted in only one order.
When a clustered index is created on a table, the data rows in a table are stored in sorted order. When a table has a clustered index, the table is called a clustered table and it usage a balanced binary tree to store it’s row data. And if a table has no clustered index, its data rows are stored in an disordered structure called a heap table.
The leaf node of a clustered index is actual data row of the table.
CREATE CLUSTERED INDEX cidx_MyTable_MyColumns ON MyTable (MyColumns)
For example, I am taking AdventureWorks2012 as my database and to create a clustered index over Person.Person table on column FirstName;
USE AdventureWorks2012 CREATE CLUSTERED INDEX noncidx_Person_FirstName ON Person.Person(FirstName)
Nonclustered indexes are just like a “table of content” of a book which have a separate structure from the actual data rows of the table. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value. If the table on which we are creating nonclustered index has a clustered index already, the pointer to the data row will be the key value of clustered index other wise the row id of the heap table will be used as a pointer to the row data. The pointer from an index row in a nonclustered index to a data row is called a row locator.
We can add non key columns to the leaf level of the nonclustered index to by-pass existing index key limits (900 bytes and 16 key columns) and execute fully covered, indexed, queries using INCLUDE keyword during creation of nonclustered index.
CREATE NONCLUSTERED INDEX cidx_MyTable_MyColumns ON MyTable (MyColumns)
In case you want to include some non key columns;
CREATE NONCLUSTERED INDEX cidx_MyTable_MyColumns ON MyTable (MyColumns) INCLUDE (MyColunNonKey1, MyColunNonKey2, ...)
CREATE NONCLUSTERED INDEX noncidx_Person_FirstName ON Person.Person(FirstName)
And in case of non key;
CREATE NONCLUSTERED INDEX noncidx_Person_FirstName ON Person.Person(FirstName) INCLUDE(MiddleName, LastName)
So as of now, we have learned, what is a clustered index and nonclustered index and how we can create them.
Now just have a look on the structure of indexes
Indexes store their information in a balanced tree, referred to as a B-tree, structure, so the number of reads
required to find a particular row is minimized.
So in above picture we can see that, to read any data row we have to traverse equal depth of B-tree. In case of nonclustered index the leaf nodes will point to the key of either clustered index or row id of the heap table instead of actual data.
Now we are going to focus on when to choose which type of index.
When to use Clustered vs NonClustered indexes:
Before deciding the type of index, have a look on below points which will help to decide the type of index to be selected;
- Indexes should be balanced between SELECT operation vs Insert + Update + Delete operations in most of the OLTP systems. But it also depends what favors your environment. For example in case of report oriented system, we don’t much worry about this.
- Keep the clustering key small. This might help in facilitating smaller nonclustered indexes as well.
- Indexes and Inserts – As much nonclustered indexes you have it will decrease the performance of insert operation
- Indexes and Updates – All nonclustered indexes have to be updated with update command which affects the clustered key and may decrease the performance of update operation. In case of update on included columns, all nonclustered indexes will also be updated if they contain the updated column.
- Indexes and Deletes – And during delete operation all associated nonclustered indexes also have to be updated accordingly.
- Index Overhead – Having a lot of nonclustered indexes can increase index overhead for insert, update and delete operations. As clustered index stores actual table data and not needed to be created separately as like nonclustered indexes having their own structure and memory area.
- Number of rows to be retrieved – In case of large number of rows a clustered index or covering index (nonclustered with INCLUDE non key columns) are good. Because they do not need any Key Lookup or RID Lookup.
- Data-ordering requirement – If we have a lot of queries which performs order by clause on a column having highselectivity, we can proceed with clustered index. Here selectivity is (cardinality / number of total rows), andcardinality is number of distinct rows.
- Index key width – Each nonclustered index keeps clustered index key as pointer, the length of index column matters. If our index column width is wide and we have a lot of nonclustered indexes, we should create nonclustered indexes with nonkey columns.
- Column update frequency – As per point 4, an update affecting clustered key column or included column will impact nonclustered indexes accordingly created on that table.
- Lookup cost – Key lookup and RID lookup, these lookups are costly operations for sql server database engine. Key lookup is a lookup to the base table if a nonclustered index is not fulfilling the requirement of all columns to be returned as a result set and the base table has a clustered index over it. And in case the base table does not have a clustered index over it and is a heap table, DB engine will perform RID lookup (Row ID lookup).
- Examine the WHERE clause and JOIN criteria columns for frequently used queries.
- Use narrow indexes – It’s always better to have small indexes instead of one covering or wide index. Because the data is sorted as per the index columns from left to right, so, if a table has one index including 3 columns in it; lets say, col1, col2, col3 in same order, any query having joins or where clause on col2,col3 (excluded left most col1), will have much more probability of a table scan instead of using this index. In case of three indexes on col1, col2 and col3, database engine can perform index join and index intersection methods and use these indexes instead of going to scan entire table. Remember, index join may or may not be picked by optimizer.
- Examine column uniqueness – Any index will help only if you have high selectivity. If selectivity is very low, query optimizer will choose to scan the entire table instead of scanning the index in case of nonclustered indexes.
- Examine the column data type – Small data types can perform better.
- Consider column order – In point 13, i have demonstarted an example, where order of a column matters.
- Rebuild the Clustered Index in a Single Step (DROP_EXISTING instead of Drop and Create)
Create Nonclustered Index instead of a Clustered Index when:-
- When the index key size is large.
- To avoid the overhead cost associated with a clustered index, since rebuilding the clustered index rebuilds all the nonclustered indexes of the table in some situations. For example – Drop and Create clustered index independently in two commands.
- To resolve blocking by having a database reader work on the pages of a nonclustered index, while a database writer modifies other columns (not included in the nonclustered index) in the data page; in this case, the writer working on the data page won’t block a reader that can get all the required column values from the nonclustered index without hitting the base table.
- When all the columns (from a table) referred to by a query can be safely accommodated in the nonclustered index itself.
We should also keep in mind the below points related to “Advanced Indexing Techniques” as a final notes:-
- Covering indexes (A Pseudo clustered Index).
- Index intersections or Index joins (a variation of index intersection).
- Filtered indexes – In case of lower selectivity on index column.
- Indexed views – Can be a better choice if insert, update and delete operations are very low.
- Index compression – So more data can be kept in one index page which will reduce the scan page count.
- Columnstore indexes -If aggregates are primary operations to be performed.
So, in this article, I tried to cover the points to be taken care of during the selection of index type. Please give your feedback as comments below.