Good keys, what are they like?

A central value add of data warehouses is their ability to restore the sanity that comes from using good keys. Taking a model-agnostic view of keys, they refer to “something” that is uniquely identifiable. Depending on what you are modeling, those “somethings” have different names, for example: entities, rows, tuples, cells, members, documents, attributes, object instances, and relations between the any of the latter. Because this article is about relational databases and because I don’t want to move up to the “logical” level with you (as I promised before), I will use the term: “row” as the “something” that a key refers to, and the term “column” as the data structure that implements the key.

Moving on, the definition of a “good key” is a column that has the following properties:

  1. It forced to be unique
  2. It is small
  3. It is an integer
  4. Once assigned to a row, it never changes
  5. Even if deleted, it will never be re-used to refer to a new row
  6. It is a single column
  7. It is stupid
  8. It is not intended as being remembered by users

Continue reading…

Grade of the Steel: Introduction

There are a lot of things I want to say on this blog, so I do apologize in advance for jumping a bit between subjects. I will shortly write the next installment of my data modeling series – but before that, I want to let you know about some exciting new tests that I am working on in my copious amount of spare time Smile.

Continue reading…

Why Integer Keys are the Right Choice!

The blog subject has come up a lot during my years of consulting. Source systems, traditionally composed of shoddy code and poor data modeling, will often use “natural keys”. Such keys can be of any data type, strings (CHAR/VARCHAR), DECIMAL, or other inefficient types. It is my claim that keys in a warehouse should always be integers and that it is not that difficult to implement proper keys. In this post, I will show you what the price is of using non-integer keys and let you be the judge of how much hardware you are willing to waste on making lazy key choices.

Continue reading…

SQL Server Logo

Utility functions: fn_convert_to_base and fn_nums

I will often use code to illustrate my points in this blog. Because data generation is a big part of these examples, I will take the chance to introduce a few functions that I find useful for that. I will use these functions in my examples, so refer to the Utilities category on this blog to find the source. Most of these functions are adaption from other users of SQL Server and I will do my utmost to give credit where credit is due. If you feel you are the owner of the original idea – please send me an email so I can give you proper credit.

Continue reading…

Defining the Good Data Model

Designing data models is fun – at least if you are a geek like me. But as much as I like the academic thrill of building something that is complex – I am aware that it is often humans that eventually must see and maintain my beautiful (data) model. How can we design something that humans can understand?

Humans are buggy! In general, they don’t deal well with complexity. You can blame modern education, you can scream and shout, or languish on the fact that the IT industry is riddled with incompetence, you may even throw Kimball or Inmon books at the wall in anger. But the empirical tests all show the same: the wetware is the final test of the model.

Continue reading…

Data Warehouse Introduction

In this blog, I will argue for and against key architectural choices that face the data modeler who wants to build a big data warehouse. I will do my utmost to avoid hand waving and references to “Logical Models”. Indeed, I find that talking about a “Logical Model”, whatever that is nowadays, distracts from the real issues that must be solved by the designer and that such model provide very little value in a real life implementations. I will therefore move directly back and forth from between conceptual (business user understandable) design and the actual implementation.

Continue reading…

SQL Server Logo

Automating build of Integration Services projects

After our experiences with the build process in Analysis Services – expectations were low when we turned our gaze towards Integrations Services (SSIS).

Our overall experience with SSIS has generally been positive – it’s an extremely powerful tool with some interesting performance characteristics. There are some annoying flaws – especially in the Visual Studio integration. When I work with SSIS projects – three or four daily crashes of Visual Studio is not uncommon. Never fear – the power provided by the tool more than makes up for this when it comes to choosing the ETL platform. And buggyness is what service packs are for … isn’t it? 🙂

The first thing you notice when you choose “build” in Visual Studio on an SSIS projects is that Visual Studio doesn’t do anything – except of course copy files to the bin directory. Why? Well, guess you could say that the *.dtsx files contain all the information need to run the packages – there is “nothing to be done”.

 

Continue reading…

Ancient Cube

Automating build of Analysis Services Projects

As you may be aware, an Analysis Services project consists of the following files:

  • A project file (*.dwProj)
  • One or more  Data sources (*.ds)
  • One or more Data source views (*.dsv)
  • A database file (*.database)
  • One or more cube files (*.cube) with their partition (*.partitions)
  • One or more dimension files (*.dim)

You can right click the project in Visual Studio and choose "build". What happens now seems to be a lot of called to DLL files executed by the integration between Visual Studio and the Analysis Services. The output of this build is the bin folder in your project directory. This folder contains a database file (*.asdatabase). The .asdatabase file is the one you want – you will see why later.

Continue reading…