A few weeks ago I was asked a great question by Sam, who follows my blog. With Sam’s permission, I am reprinting has question about indexes here.
Sam asks (my highlights) :
“We are told that we can have only one Clustered Index since we can actually sort the data in only one way. However, a clustered index orders data in the leaf pages in sequential order, but does not order the data within that page itself to be in sequential order. Since the data on the page itself is not sorted sequentially (and thus implying more than one way of possible order), I am confused by the “Since we can sort data only in one way we can’t have more than 1 clustered index” reasoning.”
Does Physically ordering rows inside database pages make sense?
For example, let us say there is a clustered index on the names of World Cities and let us say that names starting with A and B fall on the first page; names starting with C and D fall on second page. And let us say that on the first page, we have the cities Atlanta, Bombay and Boston and on the second page we have the cities Cleveland, Delhi and Detroit.
In the above scenario, the data could have been stored in the following manner in the Clustered Index:
Page 1 (A to B):
Page 2 (C to D):
Since, the data on the page itself is not sorted sequentially, it is also possible to store data in another order as follows:
Page 1 (A to B):
Page 2 (C to D):
So, while it is logically possible, in reality we cannot create another clustered index, where the order of data across pages is still maintained, but the order within the page can be different.
Can you give your reasons/opinions here?
Another aspect of this question is, is there any reason SQL Server does not order the data within an individual page of a clustered index? Is it because it may be more expensive to re-arrange data within the page for every insert but cheaper to insert within the page in whatever order but rely on Row Offset information?
These are excellent questions that touch on some great areas of computer science. I am always happy to give my opinion.
Illustrating the Issue
Before I move on to the opinion bit, I would like to show you a repro that makes illustrates Sam’s point that rows are not physically ordered within the page.
Consider this simple table:
CREATE TABLE Foo
K CHAR(10) NOT NULL PRIMARY KEY
, Payload CHAR(23) NOT NULL DEFAULT(REPLICATE(‘_’, 23))
TRUNCATE TABLE Foo
DECLARE @Row INT = 0
DECLARE @Val INT = 42
SET NOCOUNT ON
WHILE @Row < 1000 BEGIN
/* Create semi random numbers, correcting for horrible SQL Server hash function */
SET @Val = ABS(BINARY_CHECKSUM(@Row, @Val, ‘A’)) % POWER(2,30)
PRINT (sys.fn_varbintohexstr(CONVERT(BINARY(4), @Val)))
INSERT INTO Foo (K) VALUES (sys.fn_varbintohexstr(CONVERT(BINARY(4), @Val)))
SET @Row = @Row + 1
This will create a table with 100 rows inserted in a random order Now, let us have a quick look at the pages in the table:
DBCC IND (2, ‘Foo’, 1)
Pick any PagePID with PageType = 1 and inspect the page with:
DBCC PAGE (2,1,,2)
Here is what I get on my machine from the DATA dump:
As you can see, the data in the page is not physically ordered on the page. However, let us just see how the slots map on the page:
DBCC PAGE (2,1,,1)
From my machine:
Slot 19, Offset 0xd8, Length 40, DumpStyle BYTE
Slot 80, Offset 0x128, Length 40, DumpStyle BYTE
As you can see, the slots on the page are not in the same order as the physical layout of the data. In fact, the slots are ordered as per the index.
Physical vs. Logical Order
By using an indirect mapping of row slots to physical locations on the page, we can use a simply binary search to quickly locate a specific row within the 8K page. Unlike on disk, there is no such thing as “defragmenting” a page to make the physical order match the mapping order.
Would it be worth doing this and paying the cost of keeping rows physically ordered (maybe even in different ways, depending on access pattern).
First, consider why we worry about logical and physical order in the first place. We do this to get more sequential instead of random I/O on spinning media. This optimisation drives down cost when running on hard drives – but is largely irrelevant on SSD. However, once a page is in memory, the issue of sequential I/O is no longer a problem we need to worry about, we can fall back on old fashioned binary searching. It would therefore seem that it makes no sense to pay the extra cost of keeping the rows physically sorted on the page, because once they are in memory, its fast to find them.
Second, consider the additional cost of logging physical reordering of the pages during inserts and update operations. We would have to log not only the changes in the map, but also which rows were shifted around in the data when a new row is inserted. This makes inserts more expensive and would make page splits, already expensive, more complex.
So far, everything seems to indicate that SQL Server has chosen the right balance with regards to the page map logical order vs. the data physically ordered on the page.
Random and Sequential Memory
The cost of locating a row within 8K of data using a binary search might seem trivial compared to the cost of bringing data from disk to memory in the first place. However, once you consider modern systems that largely run in memory or have very fast disk – the cost you pay to find stuff, even if you know it is memory resident, becomes interesting.
It turns out that it matters quite a bit how you access memory. It takes around 100 CPU cycles to get 64B or 128B of memory from DRAM to the CPU. This quickly adds up and the CPU reports itself as busy while waiting for that memory (there is no such thing as a “CPU waiting for memory” wait stat). I have blogged about sequential memory access here:
Joe Chang also made the interesting observation that decoding exactly where the data you need to look at next is locate is a very CPU intensive operation:
All this indicates that there might be a better way to store data inside pages in SQL Server. For example, if you use a column store, physically reordering the rows, as suggested by Sam, inside the pages can lead to better compression. I have blogged about this here:
Sam’s question hints at another interesting option, namely to not bother maintaining a sorted row map at all. Because visiting memory sequentially is quite a bit faster than visiting it randomly, it may simply be better to visit all the rows in a page instead of trying to be smart about finding the one you are looking for. Such a a memory efficient access pattern has been suggested (and proven) to work in this article:
If the rows were stored sorted, it might even be possible to write a “forward only” search that
In this blog, inspired by Sam’s question, I have been musing over the trade-offs you have to make when storing data inside the pages of a database. There is quite a bit of performance to be found here by optimising for different workloads and CPU architectures. It is the subject of a lot of research and by following the links above, you can start to dig deeper.