In this next instalment of the big box blog series I will provide some guidance on how you configure a big box for best performance.
I have previously blogged about my standard guidance for SQL Server configuration. But for a big box, you need more than this.
Think for Yourself!
From the introductory blog, you may recall that I mentioned big boxes are rare. I estimate that 95% of all boxes out there are not big. If you bought a big box, most the “best practices” and standard guidance you find don’t apply you. You will have to deviate from the standards, sometimes in ways that there is no “official” guidance for. In other words, you can no longer rely on Microsoft to think for you, you have to think for yourself. The corollary is that you can’t rely on me to think for you either. You should take everything I say here with a pinch of salt – it may not apply to your specific case.
This doesn’t mean you you should start guessing random configuration changes. I am afraid you just need to bite the bullet and do the hard work of thinking and testing your theories. A solid grounding in scientific methodology will come in handy.
Irrespective of the workload you run, there are some general things you need to configure.
Tempdb: One file per core, no exceptions.
Trace Flag 8048: Documented here, this trace flag enables an important fix.
Trace Flag 1236: Documented here, as above.
Latest Windows Version and service pack: Your IT department may not have “validated” this. Refer to the “think for yourself” principle. The Windows kernel is constantly evolving. One of the places that are seeing improvement is in networking and NUMA awareness, I-O and memory management. Because you are a big box, those improvements are there for you.
Latest SQL Server Service Pack: Same argument as above. You are a big box, learn to live life dangerously.
Automate collection of Spin Lock and Latch Stats: If you are not already collecting the contents of sys.dm_os_spinlock_stats (could someone please ask Microsoft to document this one?) and sys.dm_os_latch_stats, you should be. These DMV become almost as important as sys.dm_os_wait_stats.
You need to have the best network drivers you can find. With good drivers, the CPU load of the network can be spread evenly over all cores. However, it is often a good idea to affinitise network (and I/O) to specific cores and set up SQL Servers affinity mask to avoid these cores. Getting that configuration optimal is trial and error, but XPERF/WPA helps a lot here as it allows you to pinpoint where the kernel time goes. I have generally had the best experience with Intel NIC and prefer them because I am familiar with the driver stack. I hear that Broadcom NIC are now very good too. You may need to turn of TCP/IP Chimney if you see sporadic disconnect errors from SQL Server (The infamous “Communicating Link Failure”).
You should also make yourself familiar with this paper and the method it describes.
What you are going to experience from here is likely going to depend on the type of workload you run.
This workload is characterised by a high amount of user concurrency with a lot of short running queries. Typically, you will be in the tens (or hundreds) of thousands of round trips per second to the server.
The great news is that OLTP workloads are most likely to benefit from a big box, at least if well written. The main issue you will have with OLTP system on a big box is thread coordination. If your main wait stat is locking (LCK) moving to a bigger box is unlikely to help your workload. In fact, it will most likely slow it down because locks take longer to acquire on a big box. A great example of this effect can be seen in TPC-C on the WAREHOUSE table. This table is highly contended with update locks. Moving a small workload TPC-C from a 2 socket to a 4 socket machine will typically slow down the workload.
Once you have eliminated LCK waits, the next thing you have to worry about is latches and spinlocks. Of these, the latches tend to be the most common.
The PAGELATCH wait tends to appear in three situations:
Too Few Data files: This typically results in waits for PAGELATCH_UP. They are easy to diagnose and the solution is to add more data files to the filegroups in the database. However, even when your files equal the number of cores, you may still see a little of this wait. Here is an OLTP workload at 48 cores with a varying number of data files:
Last page INSERT: Typically shows itself as PAGELATCH_EX. This happens when you use IDENTITY or SEQUENCE to generate keys. I have written elsewhere about this scale problem. The solution is to use GUID or reversed SEQUENCERS for keys instead. fixing this can be a major change in the database design.
Small tables with contested UPDATE: Typically shows itself as PAGELATCH_EX and PAGELATCH_SH. This happens when a small table with narrow rows has a lot of UPDATE statements running on it. The solution is to “pad” the rows so each row is one full page. Adding a CHAR(4000) NOT NULL with a default will typically fix this. On SQL 2014, a default value is not enough, you have to actually change the padding column with an UPDATE statement.
The transaction log is the final wall for DML heavy workloads like OLTP. Your best bet is to place the log on 2-4 dedicated 15K spindles with a good write caching controller. Don’t put anything else (like data files) on the log drive if you do this. Alternatively, place the log on PCIe based SSD with low latency. You should aim to have the log writes take less than 100 microseconds (0.1ms).
Unfortunately, even if you throw the best disk system at the log, you can still write transactions faster on a 2 socket than a 4 socket. This happens because the log has a central spinlock protecting it (it has to). This spinlock can be locked and unlocked faster on a small system than on a big system. If you see the spin: LOGCACHE_ACCESS grow very fast as you add more work to the system, you know you are bumping into this bottleneck. If this is your bottleneck, you are likely better off on a 2 socket.
Once you really push an OLTP system, you may see spinlocks increase. It is difficult to say something general about what to do about these. You simply have to diagnose each as you see it and come up with a good guess on how to work around it, or ask PSS for a patch.
Some examples of spinlock bottlenecks I have come across:
MUTEX: I ran into this one while doing dynamic SQL with sp_executesql. The solution was to do more work inside the sp_execute SQL than in the loop surrounding it.
SOS_OBJECT_STORE: I saw this spin in two cases. One of them I blogged about here. The other was related to spins protecting the security data structures. It was fixed by making all users sa – not a nice workaround (There was a SQL Bits session about this one). I think this has now been fixed.
LOCK_HASH: This spin used to be quite common on OLTP workloads but has not largely been patched. From the name, you can guess that it is related to locking. However, you have to drill deeper to understand what lock is causing it. In my particular case, I saw it when a workload used custom schemas (i.e. not dbo). The workload was improved by moving all objects to dbo! I believe this has now been patched as I have not been able to reproduce it on the latest service packs.
Some of the spinlocks of the same name are used in several areas of the engine. Even if my MUTEX spin had a certain solution, it may not mean yours has the same one. You just have to diagnose it using the guidance in the paper above.
The only trace flag I am aware of that need to set to enable the latest fixes is T9024 which is relevant for AlwaysOn systems. If you know any more of these, please comment.
Also known as DSS, Analytics, BI, OLAP and Big Data, and other names expensive consulting agencies come up with to sound like they are inventing, instead of renaming things. This workload is characterised by a low amount of query concurrency, but a lot of long running queries that use parallel execution.
First of all, very few parallel queries scale well beyond DOP 16 in SQL Server; index builds being the notable exception. Joe Chang has done several experiments illustrating this and so has Chris Adkin – their experience matches mine. Because of this limitation, it is tempting to configure max degree of parallelism to 16. However, it is worth making yourself familiar with Adam Machanic’s “Magic CROSS APPLY” and his make_parallel procedure.
In fact, if you are planning to run a data warehouse on a big box, you need to read all of Adam’s articles before you do anything else. When you have read them, ask yourself: “Why do I need to do all this to make it work?”. Then, think carefully about what features you REALLY want in the next version (hint: Single table restore or XML indexes are probably no longer that important).
This latch is generally an indication that something is bottlenecking in the query plan. One trick that often works is to hash partition the tables involved. Sometimes, you have to dig into the specific operator in the query plan by looking at the resource_description of the CXPACKET wait.
On a data warehouse, there are two settings you nearly always want to run:
Network Packet Size: Set to 32767. It improves throughput a lot
Jumbo Frames: If possible on your network, enable it.
You are likely going to need at least one or two 10Gbit NIC on a big box. Alternatively, get several 1Gbit NIC.
SELECT INTO and INSERT…SELECT
Until SQL Server 2014, the insert part of these common data warehouse queries is single threaded. You are likely to need some manhandling of the ETL flow to speed up insert speeds on a big box. Refer to this whitepaper:
If you are not able to parallelise these types of queries, you are again likely to see a regression as you move to a big box. The reason is that small boxes have much lower latency on the transaction log and can run a single thread faster than a big box (because smaller boxes have higher clock rates and lower memory access latency).
I/O and Filegroups
Unless your data warehouse largely fits in DRAM, you are going to need a monster I/O system to feed the database engine. If anyone tells you that SQL Server or Windows cannot do I/O, do this:
SQL Server is a greedy beast, it can eat tens of GB/sec of bandwidth and Windows will happily deliver. Likely, you are going to need some serious SSD (prefer bandwidth over latency for a data warehouse) or a Fast Track design. Test the design carefully before deploying it. When using traditional, spinning media, don’t be afraid to leave a lot of space empty in order to get the speed you need.
If you are tuning for sequential I/O on a data warehouse with spinning disks, it is wise to follow the Fast Track guidance. Even though you may not own a Fast Track reference configuration, the ideas here still apply to you. This means having 1-2 files in per 2 or 4 drives and splitting those drives into a lot of LUN. This can be a real nightmare to manage (you MUST script it), but the performance is often worth it.
NAND flash is now getting very cheap, in the 1 USD/GB range. This means that you can now buy a good “enterprise” flash drive like the Intel 3600 for less than what you pay for SAN storage. The cheaper drives can’t take as many write cycles as the expensive ones, but you would be surprised at just how many write cycles they CAN take. They are likely to outlive the server. Additionally, if you are running a data warehouse, chances are you will be reading a lot more than you are writing. Consider if it is worth the trouble to tune for sequential I-O on spindles, or if you just want to bite the bullet and get a pile of cheap SSD in some big SATA disk shelves.
On a big box data warehouse, you are likely to need a dedicated network just to handle backups. Again, SQL Server is capable of spamming the backup system with tens of GB/sec. This type of bandwidth is likely to flood your existing backup infrastructure with too much speed. Consider using a dedicated backup system for the data warehouse.
If the database is larger than a few TB, you should be thinking about a filegroup based backup strategy. Remember to back up your PRIMARY filegroup every time (and don’t place any tables or indexes in it).
In this blog, I have described some tricks you need to apply if you want scale on big boxes. I will be updating this with more details when I have time to write again. However, I wanted you to have early access to this guidance.