Computer Bugs

Race Condition when Creating Unique Values

During my tuning sessions, there is a race condition I run across so often that I think it is worth blogging about it. It concerns the handling of a common requirement: How do I check if the value I want to insert is in the table already? And if it is, don’t insert the value.

Here is how the typical, non DBA person, often does this:

The problem with this pattern is that it is wrong. Under transaction isolation levels less than SERIALIZABLE, the above code does not guarantee that you won’t insert a duplicate.

Here is why:

Consider two queries A and B both wanting to insert the value 42. If this sequence happens:

  • A executes the SELECT with @X = 42. Concludes that 42 is not in Foo
  • B executes the SELECT also with @X = 42. Concludes that 42 is not in Foo
  • A executes INSERT
  • B executes INSERT

You now have duplicate values of X in the table. More times than I can remember, I have had programmers ask me “Why this is allowed when there is a transaction around the two statements?”. “Why does the database get us into an inconsistent state here?”. The answer is that the transaction isolation level of the database determines just how strong the consistency should be. By default, SQL Server will use the READ COMMITTED isolation level which releases locks fast (good for scale). MySQL will use REPEATABLE READ as the default isolation level, but this has other problems (long held locks for bad queries). Both of these won’t guarantee that the above only inserts the value when it is not already there.

Solution 1: Declare a Unique Key

The obvious solution is of course to declare a unique key on X. However, it is not obvious that this is the best solution. There are typically two problems with using a key to enforce the uniqueness:

1) X may only be unique under certain conditions (if you are in this situation, it may indicate that you may have a normalisation problem)
2) Exceptions are expensive if the violation is common

Ad 2) When the key violation happens, you typically send an exception back to the calling application. If your database support exception handling, you can handle it before the exception travels to the client. But in both cases, throwing an exception is pretty expensive. Exceptions are meant to be… Exceptional. If you expect the violation of X uniqueness to be common, this solution is not a good way to address the problem.

Solution 2: Change the Isolation Level to SERIALIZABLE

Another option is to force the isolation level of the transaction to be SERIALIZABLE. However, this has very dangerous consequences that can introduce subtle bugs. Having a large transaction in SERIALIZABLE mode also limits scalability of the database.

To illustrate this, try the following demo:

Execute the below transaction on two connections, taking turns between each statement (and inspecting locks as you go along).

Not quite what you expected perhaps? Here are the results on my SQL Server:

Isolation Level Effect
READ UNCOMMITED Duplicates
READ COMMITTEED Duplicates
READ REPEATABLE READ Duplicates
SERIALIZABLE Deadlock (but consistent)
SNAPSHOT Duplicates
READ UNCOMMITED Duplicates

Solution 3: Force a Lock

It is perhaps not surprising to my regular readers that the manhandling of the database with query hints is my preferred solution to this problem.

This works beautifully:

This solution has the added advantage of not causing deadlock situations. Unfortunately, it is database specific. In MySQL, the hint to solve this is SELECT FOR UPDATE.

Why am I forcing both exclusive (XLOCK) and row (ROWLOCK) locks above? Consider the case where the optimiser decides to scan the table instead of accessing the index to find the row (this can be illustrated by hinting INDEX = 0 above). The HOLDLOCK hint is also required to make sure the range lock gets held during the transaction. If the index was unique, this can be omitted (but no harm in leaving it in)

Solution 4: Do it in One Statement

Depending on the exact semantics of your database platform, you may be able to do this in a really wizardly way.

Most databases will treat a single statement as a self contained, fully consistent unit of work. This means that if you can check both for the existence of the row AND do the INSERT if found. If you use a single statement, you wont have to worry about race conditions. In fact, you wont even have to manually create a transaction.

As Brad points out in the comments, there is an issue with this approach when the key is not unique. The range lock is taken too late and this creates a race condition which allows for duplicates even in a single statement situation. This means that even if you do it in a single statement, you will still have to hint it. I have been pondering this quite a bit and I am concluding that based on Brad excellent comments, I think this solution (even with the hints) is dangerous. Solution 3 is much cleaner as it is clear which order the locks get taken in

This will works in SQL Server:

And for generating and returning a unique ID from a table that contains the last ID generated, you can do this:

Summary

In this short blog, I have shown you ways to work around a common race condition. I would encourage you to read the comments section as there are some very important subtleties here that Brad has kindly help address.

Happy bug slaying.