# Arbitrary Shapes in AS 2005

I have recently returned from a performance tuning assignment at one of the largest AS-cube installations in the world. Working on such large installations always teach you a thing or two about the internal workings of the server products in use. Learning this new stuff is what I love the most about performance tuning. My colleague and me had a conference call with the people at Microsoft – and they have been really helpful about sharing their knowledge. I wanted to pass on some of it here.

First of all – let me highly recommend the new ability to trace a query in SQL-Profiler. MDX coders: you will love this feature, it will really boost your efficiency while tuning MDX. One of the events you want to watch out for is "Query Subcube Verbose". This event tells you a lot about the internal working of the AS engine. My knowledge is still pretty sparse – but this is what I have gathered:

MDX queries are evaluated by the **Formula Engine **which interprets the dimensions axes and combine aggregations as necessary to answer the query. The aggregations are stored in the **aggregation data registry **which can consult the **Storage Engine** if it needs new aggregations to populate the cache**. **

Aggregations are returned from the Aggregation data cache using **subcube queries** – these are the ones you can trace in profiler. To determine which subcubes to request the Formula Engine uses and a mechanism called "SONAR" (I need to see if any literature is written about this).

Now, the amount of subcubes requested from the **aggregation data registry **has a large impact on query performance. Subcube queries can either be cached in memory or need retrieval from disk. If cached, a subquery is pretty fast – typically a few millisecond. Of course, these can add up and take time if you have many of them. On the other hand, if your subcube queries need to access the Storage Engine for aggregations they can be quite slow – since this requires I/O requests. The more subcubes you need to retrieve from disk – the larger your I/O load will become – eventually resulting in bad performance. You can use SQL-Profiler (look at the event subclass) to see if a subcube is retrieved from disk or memory.

In summary, it is important that you avoid too many subcubes and make sure you have good cache hits on the ones you have. Fortunately, you are partially in control of this process.

How does this tie into my previous post about multi-select? Well, try to execute and trace the subcubes in the following MDX statement against AdventureWorks:

SELECT <br /> {[Reseller Order Count]} <br />ON 0, <br /> NON EMPTY CROSSJOIN( <br /> Product.[Product Categories].members <br /> , DESCENDANTS([Sales Territory].[Sales Territory], [Sales Territory].[Sales Territory].levels(3), SELF_AND_BEFORE ) <br /> ) <br />ON 1 <br />FROM <br /> (SELECT <br /> {Product.[Product Categories].levels(1).item(0) <br /> ,Product.[Product Categories].levels(2).item(1)} <br />ON 0 <br />FROM [Channel Sales]) <br />

Notice the high amount of subcubes needed? Now try to execute it again…

What’s wrong here? A lot of the subcubes are *still* being fetched from disk. Even though the query has already run before.

Before we can understand what is happening here I must define what the people at Microsoft call "An Arbitrary Shape". To understand what an arbitrary shape is, it is perhaps easier to define what a "non-arbitrary shape is"

A **Non-Arbitrary Shape **is a set, S, of dimension members that match these criteria:

- All members of S must be in the same dimension hierarchy
- Let L
_{min}be the lowest level of the members in S. - Let L
_{max}be the highest level of the members in S. - For all members in the, M, in level L
_{max}the following must be true:- All ascendants of the M, up to and including the level L
_{min}must be in S

- All ascendants of the M, up to and including the level L

Put another way: The value of every member in S can be expressed as the aggregation over the members only from L_{min}

I have tried to create a taxonomy of all possible non-arbitrary shapes. These are illustrated below:

Notice that NA1, …, NA3 can all be expressed using the
DESCENDANTS function in MDX. Case NA4 is an entire hierarchy (which is of course non-arbitrary). Case NA5, NA6 is the special variant where L_{Max} = L_{Min}. There is an entire family of MDX functions that allow you to specify NA6 variants.

An **arbitrary shape** is anything that is NOT a "non-arbitrary shape"

Now, the Formula Engine can evaluate non-arbitrary shapes a lot more efficiently than arbitrary shapes. The work required to evaluate an arbitrary shape in a sub select generally involves a lot more subcube queries than when evaluating a non-arbitrary shape. Furthermore, the arbitrary shapes cause a lot of trouble with the caching of aggregations.

I do not know the exact algorithm used by the formula engine. But one could suspect that non-arbitrary shapes allows the formula engine to build the higher aggregations directly from the lower ones using information about the hierarchies in the cube. Arbitrary shapes clearly prevent this optimization.

There are many arbitrary shapes, but here are a few of them:

So what can you learn from this?

First, since arbitrary shapes are difficult to optimize you should seek to eliminate them your subcube queries. This is often an easy goal to accomplish; try this simple rewrite of the query above:

SELECT <br /> {[Reseller Order Count]} <br />ON 0, <br /> NON EMPTY CROSSJOIN( <br /> Product.[Product Categories].members <br /> , DESCENDANTS([Sales Territory].[Sales Territory], [Sales Territory].[Sales Territory].levels(3), SELF_AND_BEFORE ) <br /> ) <br />ON 1 <br />FROM <br /> (SELECT <br /> {Product.[Product Categories].levels(1).item(0).children <br /> ,Product.[Product Categories].levels(2).item(1)} <br />ON 0 <br />FROM [Channel Sales]) <br />

Trace it, and notice the difference.

Secondly, by experimenting with arbitrary shapes we can learn a lot about he internals of the query engine. There is not much documentation to be found, I invite the reader to share his experiences. Even though we would expect all non-arbitrary shape to allow "smart caching" this is not the case. Try the following:

SELECT <br /> {[Reseller Order Count]} <br />ON 0, <br /> NON EMPTY CROSSJOIN( <br /> Product.[Product Categories].members <br /> , DESCENDANTS([Sales Territory].[Sales Territory], [Sales Territory].[Sales Territory].levels(3), SELF_AND_BEFORE ) <br /> ) <br />ON 1 <br />FROM <br /> (SELECT Product.[Product Categories].members <br />ON 0 <br />FROM [Channel Sales]) <br />

Though the sub select quite easily qualifies as a non-arbitrary shapes (perhaps the most simple non-arbitrary shape I can think of) the non-cached disk access still happens – even on a warm cache.

Having done some experimentation I have found that only non-arbitrary shapes that contain members from a single level only are optimized by the query engine. Even though any non-arbitrary shape in theory should trigger a smart caching.

Pingback: Subselects And Formula Engine Caching In SSAS 2012 | Chris Webb's BI Blog

Excellent document Thomas !.

I really discovered the existence of arbitrary shapes by designing diferent partition scenarios for very large cubes. Because of the need to find a partition criteria that results in smaller partition size, I tried different type of slices.

According to the following BOL article:

http://msdn2.microsoft.com/en-us/library/microsoft.analysisservices.partition.slice(d=ide).aspx

Any tuple or set that runs with strtoset function with constrained option, can be used as a slice expression.

In fact, it’s not true at all, because if tuple or set (used in slice property) define an arbitrary shape, SSAS cannot collect consistent data inside a unique partition and it will throw errors during processing.

So When designing partition criteria we also must be aware that slice scope defines non-arbitrary shapes.

Thanks

Leandro