Statistics Blog Logo

When Database Statistics are not Enough – Search Patterns

Co-author: Lasse Nedergaard

Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro.

The troublesome query we were debugging executed like this:

  1. Find a list of keys values to search for
  2. Insert these keys in a temp table – lets call this the SearchFor table
  3. Join the temp table to a large table (lets call it BigTable) and retrieve the full row from the large table

Why not use a correlated sub query in step 2? In this case, the customer in question had multiple code paths (including one accepting XML queries) that all needed to pass thousands of key to a final search procedure. They wanted a generic way to pass these key filters to to the final access of BigTable.

The schema will make it clearer.


We created a repro and it turns out to be a great look into the strange world of database statistics. Here is the schema:

In Production, BigTable is a very large table with billions of rows. Each row in the table has RowType that describes metadata about the row. The metadata is used to populate #SearchFor before accessing BigTable, as this saves significant I/O.

Things to note about BigTable:

  • All stats have been fully updated for values that are not RowType = 199. As long as we stay below this value, we have done ALL we can
  • There is big skew in the RowType column.
  • There are 100K values that are not in the statistics (you can run DBCC SHOW_STATISTICS (BigTable, STAT_RowType) WITH HISTOGRAM to convince yourself of this)

Query Problems

When a user queries this system, the code will first populate #SearchFor with a stored procedure like this:

ReturnFromBigTable contains the troublesome statement that is the subject of this blog:

 Looks quite innocent, doesn’t it?

To simulate the customer’s workload, let us say we are searching for 1% of the rows in BigTable. We do this by populating #SearchFor.

And now, the fun begins! Lets first use our NearKey search and for a non skewed RowType:

Plan (only join shown)


Looking at Predicate in properties for the BigTable scan we unsurprisingly see that the predicate is being driven down: [dbo].[BigTable].[RowType]=(99)… Nothing interesting so far. The plan is parallel. Again, not surprising considering that we are scanning accessing 1M rows in BigTable.

… but as Lasse and I observed on his workload, something isn’t quite right here.

Let us run the same query, but with the filter RowType = 99 replaced by RowType = 177. The new plan is:


It is at this point your design intuition should wake you up with that good ol’ WTF feeling.  “Wait a minute you should say”…before (when there were fewer rows coming out of BigTable) we were parallel and hashing the output of BigTable. Now, when there are more (50x more) rows, we are both reversing the hash order, probing into BigTable and hashing SearchFor. Note that all estimates are relatively accurate (within 10%) and statistics in the space we are searching is fully up to date… If we take a fully naïve approach, trusting relational algebra and query optimizers, there is nothing more we can do here.

But hey, it gets worse: What happens when we go outside the boundaries of the statistics. This for example, could happen when the stats are not fully updated – a completely normal situation in a database. Searching with RowType = 199 (that is not in the histogram) and estimating the plan, we get:


The estimate coming out of BigTable is 1 row, the actual is 100K; this is probably going to hurt.

Running the query gives us the deadly:


Congratulations to us, we just committed performance suicide!

Lets just list the execution times of these query variants:

Join Strategy RowType CPU time Elapsed Logical I/O Reads
Hash BigTable
Probe #SearchFor
99 421ms 549ms 31429
Hash #SearchFor
Probe BigTable
177 765ms 1293ms 31429+62
Merge Join (sort both) 199 1575ms 2261ms 31429+62

What is the problem here? From the user’s perspective, the query response time will often vary unpredictably – which is a poor experience. But worse, from a system and DBA perspective: the memory consumption will vary a lot depending on which query plan we end up with. This means it becomes hard to extrapolate the total system throughput and resource requirements based on the queries. If you are capitalist inclined in your mindset, let me translate this: At the end of the day, this costs you money!

Why is this happening? It is happening because there are hidden assumptions here that you have not brought into the light. We are trusting technology (in this case the database statistics and optimizer) with solving something that is a data modeling problem.

This is a very good example of something Thomas has been seeing a lot lately: An unwillingness to be specific about your requirements costs you money – a LOT of money. In other words: There is a price tag associated with ignorance. Let us make it clear what we are NOT saying: There is nothing wrong with ignorance: it is hard to know what you don’t know. Sometimes, you simply don’t know a requirement exists. The trouble arrives when denial and lack of curiosity and analysis trumps the observed data.

Fortunately, Lasse is the curious kind and he asked all the right questions and sought knowledge to remedy the situation.

Let’s get pragmatic, and talk about solutions.

The solution

With all the query plans above, which one do we want? In this case, the answer is: “None of them”

Here is an observation about the workload we discovered as we were analysing the solution: It is reasonable to assume ask the user to accept that response time is proportional with the number of results returned. From a system perspective, we also want the memory consumption and CPU to be proportional to the returned result size. These are questions you have to ask yourself as system designer and discussion you can have with users.

This is where computer science comes into the equation. We know:

  1. That we can search a B-tree of n rows in lg(n) time (which is as good as constant)
  2. That the the returned result are capped by the number of rows (let us call this R) in #SearchFor
  3. That we are willing to accept that the runtime and resource use is proportional with number of rows returned
  4. That we cant live with unpredictable behavior

The optimizer KNOWS about the search time and the cap (1+2). But the optimizer does NOT know that we are willing to declare certain properties that narrow the search spare (3) and that we want predictability (4). Our “ignorance” has a price, and the optimizer makes a choice that might be optimal, but which unfortunately is also unpredictable.

This is a great case for hinting. And we can do it like this:

Note something important here: The order of the joined tables matter when you use the LOOP hint (which is why we flipped them above). The above hint BOTH forces the join order and the join strategy.

An alternative, which may be more generally applicable is this rewrite:

The above rewrite does not guarantee join ordering. However, with the FORCESEEK hint, the plan we want is almost guaranteed.

Let us add some new rows to the result:

Join Strategy RowType CPU time Elapsed Logical I/O Reads
Hash BigTable
Probe #SearchFor
99 421ms 549ms 31429
Hash #SearchFor
Probe BigTable
177 765ms 1293ms 31429+62
Merge Join (sort both) 199 1575ms 2261ms 31429+62
Hinted LOOP 99 406 718 81259
Hinted LOOP 177 499 964 81259
Hinted LOOP 199 306 718 81259

Above, we can see why the optimizer didn’t want to help us! The number of IOPS required for my hinted plan are much higher than the optimizer generated plans.

The hinted plan shape is:


Obviously, we could replace the non clustered index with a clustered on (which would get rid of the RID lookup) to make this query faster. But in this case, this was not a viable solution for other reasons.


In this blog, I have shown you an example of database optimizers failing to make the “right choice”. There are several reasons this is happening:

Lack of information: The database needs declarative information about the user’s intent and what we consider the Right Choice™. In the example, the missing information is: “the speed of the returned result should always be proportional with the result size”. We used a hint to control the optimizers behavior, using our knowledge of the physical execution of queries. Other people might have taken more extreme steps and argued for noSQL – perhaps too aggressive a step in this case.

Leaky abstractions: We saw the line between the “logical” and “physical” model break down. We believed that by building the right model, we had supplied enough information to the database. The distinction between physical and logical model has, as Thomas previously argued, always been a pretty useless paradigm. We have to consider the components of the system in a holistic way, but not at the expense of maintaining the overview of individual building blocks.

Optimality vs. Good enough: the optimizers default behavior is to look for “optimal plans”. In the plan search space, even a very small space like this example, plans are sometimes found that look optimal from statistics. Yet, these plans may not be optimal or have the properties we seek – or we may fail to recognize the good plans as we search the space. In our case, we just wanted a plan that was “good enough” and had certain predictable attributes.

False hardware assumptions: The optimizer makes some assumptions about the cost of IOPS which unfortunately do not correlate with the reality of a modern machine. It assumes that optimizing for a low number of Logical IOPS is a Good Thing™. In this light, the plans generated are the right plans. However, the optimizer does not take into account that the I/O system might be fast, and that there is enough RAM to have a likelihood of finding some of the needed data already residing in the buffer pool. To prove this: Recall that we had fully updated statistics available for queries going after RowType = 177. But, running the query on a petty IBM laptop the  loop hinted plan, even on an empty buffer pool, is still faster than running the plan generated by the optimizer.

The simple query we studied here raise some interesting questions about the design tradeoffs that the data modeler and architect is forced to make. In this case, we were fortunate that Lasse was on the alert for these tradeoffs. With this example, we have seen what the potential consequences of “behavior and model ignorance” are and why you need to be alert to it.


  1. Pingback: The Side You’re On « L.E.G.A.C.Y.

  2. Pingback: Link Resource # 42 : Feb 01 – Feb 24 « Dactylonomy of Web Resource

  3. Todd Everett   •  

    Great post Thomas – really informative! I liked the use of the table of numbers and modulo arithmetic to rapidly generate the test data. Thanks for sharing this.

    • mbourgon   •  

      Ditto. Just saw Grant’s course on this a week or two ago, and it’s very cool to see it in actual practice.

Leave a Reply

Your email address will not be published. Required fields are marked *