Moebius Strip

Table Pattern: Database Message QueuesTable Pattern: Database Message QueuesTable Pattern: Database Message QueuesTable Pattern: Database Message QueuesTable Pattern: Database Message Queues

At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.

At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.

At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.

At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.

At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.

My design was inspired by the LMAX Disruptor Pattern and used sequencer objects to greatly boost throughput of the queue structure.

In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.

But first, lets me walk you through the problem.

Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.

The Problem Statement

Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:

image

In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.

Relational Message Queues – Naïve

When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:

CREATE TABLE dbo.MyQ
(
[message_id] INT NOT NULL
,[time] [datetime] NULL
,[message] [char](300) NULL
)

CREATE UNIQUE CLUSTERED INDEX CIX
ON dbo.MyQ (message_id)

CREATE SEQUENCE dbo.MessageSequence AS INT

 

The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:

/* Push Naïve */

INSERT MyQ (message_id, time, message)
SELECT NEXT VALUE FOR dbo.MessageSequence, GETDATE(), ‘Hello World’

/* Pop Naïve */

DELETE Q
FROM (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q

This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:

  • The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked.
  • In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput.
  • The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on ACCESS_METHODS_HOBT_VIRTUAL_ROOT)
  • New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database
  • There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as PAGELATCH_EX in SQL Server)
  • The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split.

The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:

image

I build this implementation on an HP DL380G7 server with a FusionIO ioDrive Duo (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.  Very, very far from impressive and fuel for the NoSQL fires.

    The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?

Relational Message Queues – LMAX’ed

It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at LMAX and their Disruptor pattern, I decided to build a relational equivalent.

The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue.

The queue table now looks like this:

CREATE TABLE dbo.MyQ
(
[Slot] BIGINT NOT NULL
, message_id BIGINT NULL
,[time] [datetime] NOT NULL
,[message] [char](300) NOT NULL
,reference_count TINYINT NOT NULL
)

/* Prefill messages */
WHILE @i < @QueueSize BEGIN
INSERT MyQ (Slot, time, message, reference_count)
VALUES (@i, ‘2050-01-01’, ‘dummy’, 0)
SET @i = @i + 1
END

/* Create index and fill it up */
CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQ (Slot)
WITH (FILLFACTOR = 100)

Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my previous blog entry.

The above preallocates @QueueSize message slots. Pushing a message is now these operations:

  • Generate a new message_id
  • Find the next available slot (with reference_count = 0)
  • Update the message column
  • Add one to reference_count

And pop is:

  • Find the smallest message_id that has been “inserted”
  • Read and return the message column
  • Decrement reference_count, marking the slot as available again (variants can be done if you have multiple subscribers)

At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available slot in the message queue for push and find the smallest message_id that is not popped yet.

First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,  is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the message_id AND find the next available slot. Here is how:

/* Sequence to keep track of next message number and slot */

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Push LMAX’ed Begin */

SET @PushSeq = NEXT VALUE FOR dbo.PushSequence

/* Find slot */
SET @Slot = @PushSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, [message] = ‘Hello World’
, [message_id] = @PushSeq
, reference_count = reference_count + 1
WHERE Slot = @Slot
AND reference_count = 0 /* Don’t overwrite! */

IF <No rows affected> BEGIN
/* The slot was not available –> queue is full */
<Sleep 100ms>
<Try UPDATE again>

END

/* Push LMAX’ed end */

What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of @QueueSize = 100.

image

 

We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind PushSequence. In fact, we don’t even need to keep track of the value of PushSequence to properly pop. Here is how to implement pop:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Pop LMAX’ed Begin */

SET @PopSeq = NEXT VALUE FOR dbo.PopSequence

/* Find slot */
SET @Slot = @PopSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, @OutPutMessage = message
, [message_id] = NULL
, reference_count = reference_count – 1
WHERE Slot = @Slot
AND message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */

IF <No rows affected> BEGIN
/* No message found to pop or we are ahead of push */
<Sleep 1sec>
<Try UPDATE again>

END

/* Pop LMAX’ed end */

Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between PopSequence and PushSequence, it CAN happen (for example due to thread scheduling) that the current pop message_id gets ahead of push message_id. When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:

image

To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.

I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive between 90K and 100K messages/sec through the queue. A nice little 15x improvement.

Summary

It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.

However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.

My design was inspired by the LMAX Disruptor Pattern and used sequencer objects to greatly boost throughput of the queue structure.

In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.

But first, lets me walk you through the problem.

Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.

The Problem Statement

Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:

image

In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.

Relational Message Queues – Naïve

When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:

CREATE TABLE dbo.MyQ
(
[message_id] INT NOT NULL
,[time] [datetime] NULL
,[message] [char](300) NULL
)

CREATE UNIQUE CLUSTERED INDEX CIX
ON dbo.MyQ (message_id)

CREATE SEQUENCE dbo.MessageSequence AS INT

 

The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:

/* Push Naïve */

INSERT MyQ (message_id, time, message)
SELECT NEXT VALUE FOR dbo.MessageSequence, GETDATE(), ‘Hello World’

/* Pop Naïve */

DELETE Q
FROM (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q

This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:

  • The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked.
  • In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput.
  • The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on ACCESS_METHODS_HOBT_VIRTUAL_ROOT)
  • New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database
  • There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as PAGELATCH_EX in SQL Server)
  • The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split.

The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:

image

I build this implementation on an HP DL380G7 server with a FusionIO ioDrive Duo (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.  Very, very far from impressive and fuel for the NoSQL fires.

    The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?

Relational Message Queues – LMAX’ed

It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at LMAX and their Disruptor pattern, I decided to build a relational equivalent.

The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue.

The queue table now looks like this:

CREATE TABLE dbo.MyQ
(
[Slot] BIGINT NOT NULL
, message_id BIGINT NULL
,[time] [datetime] NOT NULL
,[message] [char](300) NOT NULL
,reference_count TINYINT NOT NULL
)

/* Prefill messages */
WHILE @i < @QueueSize BEGIN
INSERT MyQ (Slot, time, message, reference_count)
VALUES (@i, ‘2050-01-01’, ‘dummy’, 0)
SET @i = @i + 1
END

/* Create index and fill it up */
CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQ (Slot)
WITH (FILLFACTOR = 100)

Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my previous blog entry.

The above preallocates @QueueSize message slots. Pushing a message is now these operations:

  • Generate a new message_id
  • Find the next available slot (with reference_count = 0)
  • Update the message column
  • Add one to reference_count

And pop is:

  • Find the smallest message_id that has been “inserted”
  • Read and return the message column
  • Decrement reference_count, marking the slot as available again (variants can be done if you have multiple subscribers)

At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available slot in the message queue for push and find the smallest message_id that is not popped yet.

First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,  is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the message_id AND find the next available slot. Here is how:

/* Sequence to keep track of next message number and slot */

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Push LMAX’ed Begin */

SET @PushSeq = NEXT VALUE FOR dbo.PushSequence

/* Find slot */
SET @Slot = @PushSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, [message] = ‘Hello World’
, [message_id] = @PushSeq
, reference_count = reference_count + 1
WHERE Slot = @Slot
AND reference_count = 0 /* Don’t overwrite! */

IF <No rows affected> BEGIN
/* The slot was not available –> queue is full */
<Sleep 100ms>
<Try UPDATE again>

END

/* Push LMAX’ed end */

What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of @QueueSize = 100.

image

 

We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind PushSequence. In fact, we don’t even need to keep track of the value of PushSequence to properly pop. Here is how to implement pop:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Pop LMAX’ed Begin */

SET @PopSeq = NEXT VALUE FOR dbo.PopSequence

/* Find slot */
SET @Slot = @PopSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, @OutPutMessage = message
, [message_id] = NULL
, reference_count = reference_count – 1
WHERE Slot = @Slot
AND message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */

IF <No rows affected> BEGIN
/* No message found to pop or we are ahead of push */
<Sleep 1sec>
<Try UPDATE again>

END

/* Pop LMAX’ed end */

Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between PopSequence and PushSequence, it CAN happen (for example due to thread scheduling) that the current pop message_id gets ahead of push message_id. When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:

image

To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.

I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive between 90K and 100K messages/sec through the queue. A nice little 15x improvement.

Summary

It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.

However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.

My design was inspired by the LMAX Disruptor Pattern and used sequencer objects to greatly boost throughput of the queue structure.

In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.

But first, lets me walk you through the problem.

Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.

The Problem Statement

Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:

image

In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.

Relational Message Queues – Naïve

When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:

CREATE TABLE dbo.MyQ
(
[message_id] INT NOT NULL
,[time] [datetime] NULL
,[message] [char](300) NULL
)

CREATE UNIQUE CLUSTERED INDEX CIX
ON dbo.MyQ (message_id)

CREATE SEQUENCE dbo.MessageSequence AS INT

 

The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:

/* Push Naïve */

INSERT MyQ (message_id, time, message)
SELECT NEXT VALUE FOR dbo.MessageSequence, GETDATE(), ‘Hello World’

/* Pop Naïve */

DELETE Q
FROM (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q

This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:

  • The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked.
  • In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput.
  • The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on ACCESS_METHODS_HOBT_VIRTUAL_ROOT)
  • New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database
  • There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as PAGELATCH_EX in SQL Server)
  • The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split.

The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:

image

I build this implementation on an HP DL380G7 server with a FusionIO ioDrive Duo (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.  Very, very far from impressive and fuel for the NoSQL fires.

    The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?

Relational Message Queues – LMAX’ed

It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at LMAX and their Disruptor pattern, I decided to build a relational equivalent.

The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue.

The queue table now looks like this:

CREATE TABLE dbo.MyQ
(
[Slot] BIGINT NOT NULL
, message_id BIGINT NULL
,[time] [datetime] NOT NULL
,[message] [char](300) NOT NULL
,reference_count TINYINT NOT NULL
)

/* Prefill messages */
WHILE @i < @QueueSize BEGIN
INSERT MyQ (Slot, time, message, reference_count)
VALUES (@i, ‘2050-01-01’, ‘dummy’, 0)
SET @i = @i + 1
END

/* Create index and fill it up */
CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQ (Slot)
WITH (FILLFACTOR = 100)

Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my previous blog entry.

The above preallocates @QueueSize message slots. Pushing a message is now these operations:

  • Generate a new message_id
  • Find the next available slot (with reference_count = 0)
  • Update the message column
  • Add one to reference_count

And pop is:

  • Find the smallest message_id that has been “inserted”
  • Read and return the message column
  • Decrement reference_count, marking the slot as available again (variants can be done if you have multiple subscribers)

At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available slot in the message queue for push and find the smallest message_id that is not popped yet.

First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,  is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the message_id AND find the next available slot. Here is how:

/* Sequence to keep track of next message number and slot */

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Push LMAX’ed Begin */

SET @PushSeq = NEXT VALUE FOR dbo.PushSequence

/* Find slot */
SET @Slot = @PushSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, [message] = ‘Hello World’
, [message_id] = @PushSeq
, reference_count = reference_count + 1
WHERE Slot = @Slot
AND reference_count = 0 /* Don’t overwrite! */

IF <No rows affected> BEGIN
/* The slot was not available –> queue is full */
<Sleep 100ms>
<Try UPDATE again>

END

/* Push LMAX’ed end */

What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of @QueueSize = 100.

image

 

We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind PushSequence. In fact, we don’t even need to keep track of the value of PushSequence to properly pop. Here is how to implement pop:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Pop LMAX’ed Begin */

SET @PopSeq = NEXT VALUE FOR dbo.PopSequence

/* Find slot */
SET @Slot = @PopSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, @OutPutMessage = message
, [message_id] = NULL
, reference_count = reference_count – 1
WHERE Slot = @Slot
AND message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */

IF <No rows affected> BEGIN
/* No message found to pop or we are ahead of push */
<Sleep 1sec>
<Try UPDATE again>

END

/* Pop LMAX’ed end */

Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between PopSequence and PushSequence, it CAN happen (for example due to thread scheduling) that the current pop message_id gets ahead of push message_id. When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:

image

To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.

I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive between 90K and 100K messages/sec through the queue. A nice little 15x improvement.

Summary

It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.

However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.

My design was inspired by the LMAX Disruptor Pattern and used sequencer objects to greatly boost throughput of the queue structure.

In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.

But first, lets me walk you through the problem.

Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.

The Problem Statement

Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:

image

In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.

Relational Message Queues – Naïve

When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:

CREATE TABLE dbo.MyQ
(
[message_id] INT NOT NULL
,[time] [datetime] NULL
,[message] [char](300) NULL
)

CREATE UNIQUE CLUSTERED INDEX CIX
ON dbo.MyQ (message_id)

CREATE SEQUENCE dbo.MessageSequence AS INT

 

The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:

/* Push Naïve */

INSERT MyQ (message_id, time, message)
SELECT NEXT VALUE FOR dbo.MessageSequence, GETDATE(), ‘Hello World’

/* Pop Naïve */

DELETE Q
FROM (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q

This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:

  • The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked.
  • In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput.
  • The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on ACCESS_METHODS_HOBT_VIRTUAL_ROOT)
  • New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database
  • There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as PAGELATCH_EX in SQL Server)
  • The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split.

The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:

image

I build this implementation on an HP DL380G7 server with a FusionIO ioDrive Duo (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.  Very, very far from impressive and fuel for the NoSQL fires.

    The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?

Relational Message Queues – LMAX’ed

It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at LMAX and their Disruptor pattern, I decided to build a relational equivalent.

The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue.

The queue table now looks like this:

CREATE TABLE dbo.MyQ
(
[Slot] BIGINT NOT NULL
, message_id BIGINT NULL
,[time] [datetime] NOT NULL
,[message] [char](300) NOT NULL
,reference_count TINYINT NOT NULL
)

/* Prefill messages */
WHILE @i < @QueueSize BEGIN
INSERT MyQ (Slot, time, message, reference_count)
VALUES (@i, ‘2050-01-01’, ‘dummy’, 0)
SET @i = @i + 1
END

/* Create index and fill it up */
CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQ (Slot)
WITH (FILLFACTOR = 100)

Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my previous blog entry.

The above preallocates @QueueSize message slots. Pushing a message is now these operations:

  • Generate a new message_id
  • Find the next available slot (with reference_count = 0)
  • Update the message column
  • Add one to reference_count

And pop is:

  • Find the smallest message_id that has been “inserted”
  • Read and return the message column
  • Decrement reference_count, marking the slot as available again (variants can be done if you have multiple subscribers)

At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available slot in the message queue for push and find the smallest message_id that is not popped yet.

First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,  is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the message_id AND find the next available slot. Here is how:

/* Sequence to keep track of next message number and slot */

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Push LMAX’ed Begin */

SET @PushSeq = NEXT VALUE FOR dbo.PushSequence

/* Find slot */
SET @Slot = @PushSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, [message] = ‘Hello World’
, [message_id] = @PushSeq
, reference_count = reference_count + 1
WHERE Slot = @Slot
AND reference_count = 0 /* Don’t overwrite! */

IF <No rows affected> BEGIN
/* The slot was not available –> queue is full */
<Sleep 100ms>
<Try UPDATE again>

END

/* Push LMAX’ed end */

What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of @QueueSize = 100.

image

 

We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind PushSequence. In fact, we don’t even need to keep track of the value of PushSequence to properly pop. Here is how to implement pop:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Pop LMAX’ed Begin */

SET @PopSeq = NEXT VALUE FOR dbo.PopSequence

/* Find slot */
SET @Slot = @PopSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, @OutPutMessage = message
, [message_id] = NULL
, reference_count = reference_count – 1
WHERE Slot = @Slot
AND message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */

IF <No rows affected> BEGIN
/* No message found to pop or we are ahead of push */
<Sleep 1sec>
<Try UPDATE again>

END

/* Pop LMAX’ed end */

Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between PopSequence and PushSequence, it CAN happen (for example due to thread scheduling) that the current pop message_id gets ahead of push message_id. When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:

image

To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.

I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive between 90K and 100K messages/sec through the queue. A nice little 15x improvement.

Summary

It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.

However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.

My design was inspired by the LMAX Disruptor Pattern and used sequencer objects to greatly boost throughput of the queue structure.

In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.

But first, lets me walk you through the problem.

Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.

The Problem Statement

Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:

image

In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.

Relational Message Queues – Naïve

When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:

CREATE TABLE dbo.MyQ
(
[message_id] INT NOT NULL
,[time] [datetime] NULL
,[message] [char](300) NULL
)

CREATE UNIQUE CLUSTERED INDEX CIX
ON dbo.MyQ (message_id)

CREATE SEQUENCE dbo.MessageSequence AS INT

 

The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:

/* Push Naïve */

INSERT MyQ (message_id, time, message)
SELECT NEXT VALUE FOR dbo.MessageSequence, GETDATE(), ‘Hello World’

/* Pop Naïve */

DELETE Q
FROM (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q

This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:

  • The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked.
  • In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput.
  • The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on ACCESS_METHODS_HOBT_VIRTUAL_ROOT)
  • New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database
  • There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as PAGELATCH_EX in SQL Server)
  • The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split.

The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:

image

I build this implementation on an HP DL380G7 server with a FusionIO ioDrive Duo (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.  Very, very far from impressive and fuel for the NoSQL fires.

    The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?

Relational Message Queues – LMAX’ed

It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at LMAX and their Disruptor pattern, I decided to build a relational equivalent.

The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue.

The queue table now looks like this:

CREATE TABLE dbo.MyQ
(
[Slot] BIGINT NOT NULL
, message_id BIGINT NULL
,[time] [datetime] NOT NULL
,[message] [char](300) NOT NULL
,reference_count TINYINT NOT NULL
)

/* Prefill messages */
WHILE @i < @QueueSize BEGIN
INSERT MyQ (Slot, time, message, reference_count)
VALUES (@i, ‘2050-01-01’, ‘dummy’, 0)
SET @i = @i + 1
END

/* Create index and fill it up */
CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQ (Slot)
WITH (FILLFACTOR = 100)

Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my previous blog entry.

The above preallocates @QueueSize message slots. Pushing a message is now these operations:

  • Generate a new message_id
  • Find the next available slot (with reference_count = 0)
  • Update the message column
  • Add one to reference_count

And pop is:

  • Find the smallest message_id that has been “inserted”
  • Read and return the message column
  • Decrement reference_count, marking the slot as available again (variants can be done if you have multiple subscribers)

At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available slot in the message queue for push and find the smallest message_id that is not popped yet.

First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,  is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the message_id AND find the next available slot. Here is how:

/* Sequence to keep track of next message number and slot */

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Push LMAX’ed Begin */

SET @PushSeq = NEXT VALUE FOR dbo.PushSequence

/* Find slot */
SET @Slot = @PushSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, [message] = ‘Hello World’
, [message_id] = @PushSeq
, reference_count = reference_count + 1
WHERE Slot = @Slot
AND reference_count = 0 /* Don’t overwrite! */

IF <No rows affected> BEGIN
/* The slot was not available –> queue is full */
<Sleep 100ms>
<Try UPDATE again>

END

/* Push LMAX’ed end */

What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of @QueueSize = 100.

image

 

We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind PushSequence. In fact, we don’t even need to keep track of the value of PushSequence to properly pop. Here is how to implement pop:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1 INCREMENT BY 1
CACHE 100000;

/* Pop LMAX’ed Begin */

SET @PopSeq = NEXT VALUE FOR dbo.PopSequence

/* Find slot */
SET @Slot = @PopSeq % @QueueSize

UPDATE dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */
SET [time] = GETDATE()
, @OutPutMessage = message
, [message_id] = NULL
, reference_count = reference_count – 1
WHERE Slot = @Slot
AND message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */

IF <No rows affected> BEGIN
/* No message found to pop or we are ahead of push */
<Sleep 1sec>
<Try UPDATE again>

END

/* Pop LMAX’ed end */

Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between PopSequence and PushSequence, it CAN happen (for example due to thread scheduling) that the current pop message_id gets ahead of push message_id. When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:

image

To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.

I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive between 90K and 100K messages/sec through the queue. A nice little 15x improvement.

Summary

It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.

However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.