The problem I would like to talk to you about is one that is well examined and well understood, yet continues to create frustrations with customers. Unfortunately, some of these frustrations come from “best practices” that people apply consistently, which in turn consistently gets them into trouble.
When you are done reading this post, I hope you will have a better understanding on how to avoid these pitfalls. It is also my hope that some of your expectations about query optimisation will have been adjusted.
Myth: You can rely on the optimiser to always get queries right
It is commonly believed by customers that as long as you some/all of do the following you are fine:
- Keep statistics updated
- Defrag indexes
- Watch out for parameterised queries
- Keep the database patched up with hotfixes and service packs
- Index the database well
- While most of the above are good practices (perhaps with the exception of defragging indexes) – they are not enough. Even if you do ALL of the above, you are not on safe ground. Finding the optimal query plan for any given query is an NP-complete problem. This is computer science terms for: Very hard, likely impossible. If you are interested in knowing more about this fascinating subject following this link:
What it means to you: Even in the most well designed database, queries will sometime run wild or give you inconsistent execution times. When things go wrong, they often go very wrong. Queries that run for seconds can run for hours if the plan they get is bad. Expect this, and defend yourself against it.
What you can do about it: First of all, you need to properly monitor the database to spot queries that run wild. SQL Server is very well equipped to do this, the details you need can either be found via profiler/XEvent traces or by querying sys.dm_exec_requests. If you find a runaway query, it is often a good idea to force its plan to be better or add indexes/statistics to help the optimiser. In order for you to understand what a “better” plan looks like, you need to be able to spot the difference between “the plan you want” and “the plan you are getting”. More about this later.
Myth: You should update statistics after upgrading SQL Server
This is probably the most dangerous “best practise” out there. Often, people will run an UPDATE STATISTICS WITH FULLSCAN after upgrading to a new version of SQL Server. And the results if often dismal: By updating the statistics, all queries will get new plans and a carefully tuned workload can suddenly behave very different if you happen to get plans that favour skew over equal distributions. Sometimes this helps the workload, sometimes it makes it dramatically worse. It’s really a gamble when you do this.
What it means to you: You should carefully weigh the pros and cons of updating statistics after an upgrade. If you DO decide to follow the “best practise” of updating everything, make sure you take the proper action, which is:
What you can do about it: Before an upgrade, always get a trace of the query plans the server executes on a typical day. If you decide to update all statistics and run the risks, compare the new query plans with the old ones (you can use the column query_hash for this) and look for outliers. You may get lucky and only have better performance (one of the reasons you upgraded in the first place) or you may have some outliers that need tuning after the upgrade.
Myth: Updating statistics regularly is always a good idea
This myth has a lot of truth to it. Unfortunately, it also causes a lot of trouble for customers who run large databases. UPDATE STATISTICS is an expensive operation in SQL Server and it can run for a very long time; for some customers, several days. By the time you have finished updating the statistics, you will not only have burned a lot of CPU cycles on the server (which remember, you are paying a license for) – you may also end up with the result described above.
What this means to you: You may have tables in the database that never have good statistics, because the update takes so long that they are always out of date.
What you can do about it: Carefully plan statistics updates on large tables. Consider using the tricks described in my blog about the ascending key problem. This also means that having auto update statistics turned on for all tables is not always a good idea.
Myth: Auto create statistics will create the statistics I need
This is only true for single column statistics. If you have two columns that are somehow correlated, the optimiser will not be able to understand this unless you manually create multi column statistics
What this means to you: With no other information available, the optimiser will assume that filters on two columns are additive. For example, imagine a database of customers and the food they eat and cook. If 10% of all your customers buy fish and chips and 10% of them are English, then the optimiser assumes that a filter on Food = ‘Fish and Chips’ AND Nation = ‘English’ means that you will filter the table to 1% of the rows. If you add the filter: Cooking Skill = ‘Very Poor’ it will again assume this to be additive and estimate 0.1%. In all cases, we of course know that the real answer is around 10%. Statistics will in this case mislead the optimiser.
What you can do about it: You can use queries or my TableStat.exe tool to find out which columns are correlated. Once you know, you can then add multi-column statistics to them. This is especially important in data warehouse fact tables, where high correlation of columns is particularly common (a fact exploited by column stores).
Myth: Patching the SQL engine makes the optimiser do the right thing
As I said in the introduction, the optimiser is not perfect. While it constantly evolves, there simply is no such thing as an optimiser that always does the right thing. Furthermore, a lot of the changes to the optimiser are disabled by default.
What this means to you: Patching the SQL Engine most likely has no effect on the query plans you get. Unless…
What you can do about it: Trace flag 4199 will enable the latest fixes to the optimiser. Before using it, make sure you do a baseline and treat enabling this flag exactly like you would treat upgrading to a new version of SQL Server.
Myth: Buying a larger server will make my queries run faster
This is probably one of the most common myths out there. Query tuning is a tough discipline that requires a deep understanding of database internals. Often, it is tempting or even economically feasible to simply throw hardware at the problem. With a properly configured I/O system and cheap cores, it is now easy to build a server with no hardware bottlenecks.
Unfortunately, upgrading to a larger server can have negative side effects on your query workload. Queries that have bad plans can now run faster, but they will also gobble up more resources. This may make the entire workload run worse than it did before you upgraded the server. Add to this that large servers often have lower clock rates per core, which means that single threaded query plans, or plans on skewed data will run slower. For example, a 2 socket Intel Xeon can clock over 3.5GHz while a 4 socket will rarely clock over 3GHz.
What this means to you: Buying a larger server for your database may make it run SLOWER, not faster. If you decided to invest a lot of money in hardware, this wont make you look good.
What you can do about it: Perform an pre-analysis of the workload before upgrading to understand if you will benefit from more hardware. If you discover you will, then make sure you baseline the query workload before upgrading so you can quickly locate queries that consume more of the new resources. Comparing a simple query trace in the before/after situation is generally enough to spot the worst offenders.
Query optimisation is a hard problem (NP hard in fact). The code the SQL Server uses to heuristically find good plans is very complex. In general, the optimiser does a great job at finding good plans. The database is generally smarter than humans, even the most experience SQL Server veterans. After 15 years of database tuning, I am still occasionally surprised when the optimiser finds a really clever plan I never thought about forcing.
Things DO go wrong. Expect that “poor plans happen” and have a contingency plan for what to do about it. Powerful diagnostics tools inside SQL Server allow you to easily track down poor query plans – but it is up to you to use them. Query hinting, combined with a solid understanding of join types and indexing strategies, will help you fix the problem when it inevitably appears. Coming up with a good query plan when the optimiser wont give you one is not an easy task. A few hours with a consultant will often be money well spend. If you wish to learn how to find good query plans yourself, I highly recommend the book “SQL Tuning” by Dan Tow.
The very best thing you can do to avoid poor query performance and bad plans is to use proper data modeling techniques. Good data modeling has a powerful effect on all relational databases, both column and row stores. SQL Server’s optimiser is one of the smartest optimisers out there, but even that piece of code needs a hand from the data modeller.
Use dimensional models to build data warehouses. This greatly improves your chances of getting good query plans and great performance. Make sure you manually create statistics that reflect relationships between columns in the data.
When building OLTP systems, avoid Entity-Attribute-Value models. The many self joins and skewed distributions in these models mislead the optimiser. If you absolutely must use them, be prepared to force a lot of the query plans with hints (often OPTION LOOP JOIN) and to run without table lock escalation and auto update statistics.