Ancient Cube

Creating custom aggregations in Analysis Services 2005

Most of you are probably using the aggregation wizard to create aggregations for your cubes. If you done your homework and set attribute relationships, counts and properties right (please refer to my previous post) – the aggregation wizard will generally do a good job creating the right aggregations.

The exact workings of the storage design wizard are probably best left to a separate blog entry. But there are a few things you should be aware of: To determine which aggregations to build, the storage design wizard will apply the "30% rule". Roughly speaking this means is that an aggregation is only considered for building if it represents the sum of at least 30% of level below it in a natural hierarchy.

As an example, take this hierarchy:

The top level is considered for aggregation: The 3 rows each aggregate an average of 20 / 3 – around: 6.6 rows – this is more than 30% of the 20 rows just below the top level.

The second level is not considered for aggregation: It is almost as fast to scan the 30 rows as it is to scan the almost equal sized 20 rows above them.

There are rare cases when the aggregations designed by the wizard using the 30% rule are not good enough. A real life customer example had a hierarchy with these characteristics:

The middle level, with 1 million members, is not considered for aggregation by the storage design wizard:  it does not meet the criteria of the 30% rule. However, in this particular case – a large fraction of the customers queries will access the middle level. Furthermore, the distribution between the middle and the bottom level is skewed: there might only be an average of 1.2 members the lowest level for every members at the middle level – but some members have thousands of children – whereas some only have one.

For this particular customer – adding a custom aggregation to the middle level with 1 million members boosted query performance significantly. An optimization not achievable with the storage design wizard.

A word of warning: Creating the right custom aggregations is a task for cube experts. Creating every possible aggregation in your cube is not a good way to boost query performance. In the worst case your cube processing will never finish. Lets just, theoretically, assume you could process your cube with all aggregations design. Locating the right aggregation to use in the universe sized space of aggregations will still take eternity for Analysis Services. The words "eternity" and "universe" are fitting in this context – cube space can be BIG – very very BIG – you really don’t want to aggregate all of it.

Assuming you have heeded my words of warning – how do you go about manually adding aggregations to your cube? Though the API is there Microsoft does not supply a tool for inspecting and creating custom aggregations. One way is of course to edit the cube XML files or use XML/A. A daunting task.

But guess what? The developers at Microsoft probably needed a tool for their own internal use – so they build one. This tool is called AggDesign – and I have had the pleasure of using it. According to Eric Jacobsen from Microsoft, we may see this tool shipped with SQL Server 2005 SP2. AggDesign allows you to visually inspect your cube aggregations and add new ones a you see fit. The tools allows you to assign one or more "AggDesigns" to each partition in your cube. These AggDesign add extra aggregations to the ones already defined by the storage wizard. If you are working with a large cube and feel very confident that you need custom aggregations (and you really should be sure) you might be able to get a copy of this tool from your local Microsoft product support.