At Stack Overflow the other day, I once again found myself trying to debunk a lot of the “revealed wisdom” in the SQL Server community. You can find the post here: Indexing a PK GUID in SQL Server 2012 to read the discussion. However, this post is not about GUID or sequential keys, which I have written about elsewhere, it is about cluster indexes and the love affair that SQL Server DBAs seem to have with them.
It is interesting to note that both SQL Server and MySQL by default will create clustered primary keys on a table. On the other hand, you will find Oracle DBAs shake their heads in amazement as to why one would do such a thing. As with all “best practices” that are so widespread in the SQL Server community – the problem is that things typically a little more nuanced than the whitepapers would have you believe. And “it depends” is the worst cop-out ever.
Good Reasons to Cluster
As you may have guessed, this post is about the advantages of heaps. But before we get there, let us first review the good reasons for using a clustered index on a table.
Tables with a Single Index on an Ascending Key
If your table has a single index on a primary key, you probably want a cluster index. This is the default behaviour of SQL Server. But WHY is this so?
Consider the case of INSERT statements in these two tables:
CREATE TABLE FooCluster
ID INT NOT NULL PRIMARY KEY CLUSTERED
, Payload CHAR(150) NOT NULL
CREATE TABLE FooHeap
ID INT NOT NULL PRIMARY KEY NONCLUSTERED
, Payload CHAR(150) NOT NULL
In the clustered case, an insert will only touch the leaf row of the last page. In the heap case, both the index and the heap has to be updated. This results in the heap needing one more logical read and one more write than the cluster. Clearly, the cluster index is to be preferred.
For now, I just want to plant the idea in your mind that scalable tables are NOT sequentially laid out. I have blogged and presented extensively about this and Microsoft (Diagnosing and Resolving Latch Contention on SQL Server) even has it documented if you are the type of person who only take your wisdom from the corporation.
We will get back to tables with one key very soon.
Large Tables that get a lot of Ranged Scans on a Key
Obviously, if you plan to range scan a table a lot on a key – this favours the cluster. Not only does this reduce I/O – it also reduces the amount of page lookups you have to do in memory.
I have sometimes heard the revealed wisdom that clusters should also be used to support queries that have ORDER BY on the clustered key. There ARE cases where this is true and a the cluster index can help. But in my experience, these cases are very rare. It only takes a join to another tables and ordering on columns from that to completely break any ORDER BY benefits you gain from a cluster index.
This then leads us to ask: What tables are likely to be range scanned in a database?
Let’s break the question down into the two major types of workload
Data Warehouses: Range scans on date columns in fact tables are common. Joining from a dimension table to the fact will also cause range scan of the surrogate key in the fact. Do note that of you need to range scan on date efficiently, it is often wise to use partitioning to optimise this. You may not have Enterprise Edition, in which case clustering on the date column is often a great idea. An argument can be made for clustering on the fact table surrogate key that is most frequently joined on.
OLTP: Range scan are most common in two scenarios:
1) Hash build of small tables
2) Joins from the referenced table to the primary key in an FK relationship
Ad 1) We can safely ignore any benefit from cluster indexes here. Small tables are, as the name implies, small. This means that scanning them is cheap no matter how silly our indexing is. For extremely frequently accessed tables, you should most likely be creating covering indexes to squeeze out the last performance. Cluster indexes don’t have any particular advantage here.
Ad 2) When you join like this, you are typically range scanning. If you expect large result sets as an effect of this, clustering can be a great help. However, if you are generally expecting large result sets, you are either running ad-hoc reports, have poorly tuned access patterns or you are running a data warehouse. There is however an important exception to this rule: Tables where the typical query is nearly always seek on the primary key followed by a range scan of a few records in a table referencing that key where those records get created at the same time as the primary table. A good example of this is the Order/OrderLines paired table that exist in most sales systems. It makes good sense to cluster the OrderLines on OrderKey (and NOT on some IDENTITY column added to OrderLines) – because the join is frequent and nearly always results in a range scan of OrderLines that benefits from the order rows being close together. Also note that in this example, fragmentation isn’t massively affected by the key on Orders not being sequential. There is some inherent “packing” of OrderLines at INSERT time simply because the rows tend to arrive together in a single transaction (which means that any page splitting is minimal).
Tables that see a Lot of DELETE and INSERT Activity
Heaps have the great disadvantage of needing forward pointers when rows are deleted. Cluster indexes don’t suffer from this. For tables that see a lot of mixed deletes and inserts, forward pointers quickly waste space. They also doubles the cost of doing seeks: With a fragmented heap, you will now have to follow the forward pointer to find the final record, which adds an additional I/O.
Tables that are in flux like this are great candidates for cluster indexes because you only pay the price to reach the bottom of the B-tree, not the price of chasing a forward pointer.
Consider for a moment which tables these are. They see a lot of both deletes AND inserts. Are they therefore likely to be big? Perhaps, but tables typically get big because they see mostly INSERTS (transaction tables and fact tables being two classic examples). Lets keep this in mind.
Good Reasons to use a Heap
It should be clear that there are some good reasons to use the cluster index. But how common are the cluster scenarios really? Should the general rule be to use a clustered index? I don’t think there should be a general rule, because there are a lot of equally good reasons to use heaps.
Fragmentation Prone tables with lots of INSERT activity
Consider what happens if the keys in the the single index in the cluster example are NOT sequential. In this case, a cluster index is going to get fragmented.
I have already said that tables with a lot of DELETE activity are not good candidates for heaps. But what about those tables who see mostly INSERT? If you think about them, they are very common and often the tables you really need to run well. Examples of such tables:
- Tables containing OLTP transactions in POS and banking systems
- Most Data Warehouse fact tables
- Log tables of all kinds
- Healthcare records for procedures and medication
- Telco Call Detail Records (CDR)
- Investment banking Risk Vectors
- Event from Web Servers and social media
if you put a clustered IDENTITY INT/BIGINT column on those tables and declare that the key, you have just shot yourself in the foot!
…Read that again…
The IDENTITY column will make the INSERT activity un-scalable. And since this is one of your largest tables, you probably care quite a bit about that.
What about GUID keys then, the thing everyone hates? Let’s say you did that, but kept the cluster index. Your table would now be massively fragmented. Not a problem in itself and it gets you around the scale bottlenecks. But you are wasting a lot of space.
Consider the alternative: an indexing strategy where you pick a sufficiently “random” key (like a GUID) and instead of blindly declaring that key a cluster index, you instead leave the table as a heap and just put a good old fashioned unique, non-clustered key index on the GUID. The majority of the table growth from INSERTs now goes to the heap which will nicely fill up and stay un-fragmented. And that is a heck of lot more scalable than the last page insert problem (though not infinitely scalable, you need an additional trick for that). In the meantime, the index you care about remains small (it only contains the GUID, not the entire row at the leaf level) and is therefor MUCH more likely to fit in memory. Your seeks are now faster, and any fragmentation you get on the key is not going to consume as much space as if you had to fit the entire row into those leaf pages.
What’s not to like?
Tables with Lots of Indexes
Recall that I said that tables with only one index will often benefit from a cluster index. However, consider the scenario where you have multiple indexes that are frequently accessed.
As an example, let us use the ORDERS table in TPC-H. This table is often accessed by customer, so it makes sense to add an index on O_CUSTKEY. The table also has a primary key index on O_ORDERKEY.
Consider the execution of this simple statement:
WHERE O_CUSTKEY =
We can try out two different strategies:
- Unique cluster index on O_ORDERKEY and index O_CUSTKEY
- Leave table as heap. Key on O_ORDERKEY and index on O_CUSTKEY
These are the plans we get:
On my little scale factor to (10GB) TPC-H database, these are the stats:
- Cluster: 27 logical reads
- Heap: 11 logical reads
- So the heap is better, even on I/O. Why?
- Think about what needs to happen when you do the key lookup into the cluster:
- Once the correct key has been located in the secondary index, you have to traverse the entire B-tree for the clustered index just to find the rest of the row.
- Compare with the heap strategy:
Because the index contains the RID of the row (which is the exact location in the data files) there is no need to climb any B-tree to find the rest of the row. Climbing B-trees is NOT free, there is a locking and memory access cost associated with it. By clustering the table, you are forcing every index except for the cluster to pay double the B-tree climb cost. Heap are better for this scenario.
Tables that need to be Fully Scanned
The final example in defence of the heap is tables that you typically want to scan (and that don’t fall into the frequent DELETE/INSERT category).
In nearly all of the examples I have seen, these tables are fact tables in data warehouse or log files. With the exception of column store indexed tables (and soon, clustered column stores), you generally want these tables to be scanned and not seeked.
In my data warehousing course, I describe the challenges of indexing fact tables. The short version is that when you get a seek of a fact table, even with a good index supporting it, it spells trouble. Seeks destroy sequential I/O on perfectly laid out table. They are also expensive because B-tree walking costs a lot of CPU. As if that wasn’t be enough, seek heavy plans on data warehouses can cause the execution engine to consume a lot of memory when index intersection plans are generated.
Unfortunately, the SQL Server optimiser will often pick a seek over a scan – because it under estimates the speed that a modern I/O system can achieve when properly tuned for sequential access. Fast track is a good example of this; it is not uncommon to see sequential scan speed exceeding 10GB/sec. Sometimes, bad statistics cause the seek too, and bad statistics are difficult to avoid in a large data warehouse.
Because of these issue with the optimiser, it is often desirable to completely take away the option of picking a seek. This means leaving the table as an un-indexed heap or a column store index. One way to think about a column store is that it is just a heap that is optimised for fast scans. You can consider partitioning the heap or column store to give the table a coarse grained “index” that reduces the amount of data that must be scanned – this is often better than clustering.