Intel 386 CPU

One Million IOPS on a 2 socket server

Today, using Fusion ioMemory technology, I worked with our team of experts to hit 1M 4K random read IOPS on Windows. We did this on a 2-socket Sandy Bridge Server.

Below is the screenshot to prove it:

image

Think for a moment about what it will take to actually make use of all those IOPS. If this is the type of speed you have at your disposal, maybe it is time to rethink what is possible. Check out our SDK at http://developer.fusionio.com for the leading edge work Fusion-io is doing in this space.

As I am sure my regular readers  can guess, I am loving my new job!

Column Store Feature Image

What is the Best Sort Order for a Column Store?

As hinted at in my post about how column stores work, the compression you achieve will depend on how many repetitions (or “runs”) exist in the data for the sort order that is used when the index is built. In this post, I will provide you more background on the effect of sort order on compression and give you some heuristics you can use to save space and increase speed of column stores. This is a theory that I am still only developing, but the early results are promising.

Continue reading…

Deep Join

How Vertical Partitioning and Deep Joins Kill Parallelism

imageQuery plans with deep joins trees are often the result of high levels of normalisation in the data model. There are large advantages to normalising data as it minimizes the amount of data that must be written when a change happens. In traditional OLTP systems, this can be a boon.

However, normalisation is not without its costs – especially not in read intensive workloads like data warehouses.

Continue reading…

Modeling Dimensions with History Tracked, Generic Attributes

Sometimes, you need to model a database in such a way that you can dynamically extend the model without altering any tables. Perhaps the attributes changes faster than you can add new columns or the data you store has a “ragged” structure that does not lend itself well to being described as a traditional table structure.

The typical approach taken in these cases is to “pivot” the required “flexible columns” of a table into rows instead and dynamically reconstruct the schema at runtime. When you model like this, adding new attributes to the model is simply a question of inserting rows into the database.

As I sure you are aware, there are issues with this approach. Tuning columns into rows  can cause interesting issues for database engines. The flexibility requirement and performance loss to be balanced carefully. In this blog, I will walk you through an example of how to handle this generic case in a star schema.

Continue reading…

Column Store Feature Image

How do Column Stores Work?

In this blog, I will provide you with some basic information about column stores. Nothing I am writing here is vendor specific IP, but merely taken from the papers published throughout history. One of the best papers that serves as an introduction is by Stonebraker:

    The idea is older than that though, with the first papers published in the 1970’ies.

Shamefully standing on the shoulders of giants, I will walk you through a simple example which illustrate one of the key principles of column stores: Run Length Encoding (RLE).

Continue reading…

Information Staircase

The Information Staircase

With the Big Data wave rolling over us these days, it seems everyone is trying to wrap their heads around how these new components fit into the overall information architecture of the enterprise.

Not only that, there are also organisational challenges on how to staff the systems drinking the big data stream. We are hearing about new job roles such as “Data Scientist” being coined (the banks have had them for a long time, they call them Quants) and old names being brought back like “Data Steward”. Continue reading…

Radiation no digging

Why “Date BETWEEN FromDate AND ToDate” is a dangerous join criteria

I have been meaning to write this blog post for some time and the discussion about Data Vault finally prompted me to do it.

Sometimes, you find yourself in situations where you have to join a table that has a structure like this:

The join criteria is expressed:

Or more commonly, this variant with a semi open interval:

Data models that promote these types of joins are very dangerous to relational optimizers and you have to step carefully when executing queries with many of these joins. Let us have a look at why this is so.

Continue reading…

Boxing Ring

The Data Vault vs. Kimball

Here we go again, the discussion about the claimed benefits of the Data Vault. Thomas Christensen has written some great blog posts about his take on the Vault method. Dan Linstedt has been commenting.

The discussions are a good read to track:

Apologies in advance for my spelling errors in the comments, the posts are written while travelling.

As with most complex subjects, it is often hard to have people state with clarity what EXACTLY their claim is and what their supporting arguments are. It seems that the Vault is no exception – I hope the discussions lead somewhere this time.

For your reference, here are some of the posts I have previous done that solve the postulated problems with the Kimball model.

Notice that there are some very interesting claims being made about normalization creating more load and query parallelism in the comments on Thomas Christensen’s Blog by Sanjay. I personally look forward to hearing the argument for that.

How to Properly Wipe and Return a Laptop

imageThis week, I had the chance to experience something that is a rare occurrence: returning a laptop to my employer. Microsoft has been good to me, and I wanted to make sure I properly returned the laptop to them and saved both them and me from hassle.

In this blog, I will describe a process for properly wiping a laptop of data and making it ready for a new owner. You may want to use this advice if you plan to sell an old machine to a friend too.

Continue reading…