Power BI allows you to create reports easily by bringing columns from multiple tables or from a single table and by putting those columns on either rows or columns you can produce a report without considering performance implications.
A very common requirement that I have seen is of adding 0 at the end of DAX code to return 0 for cells that return a blank value.
In most of the cases you can add 0 and simply move on but in cases where you bring multiple columns then this could lead to worse perfromance of even a very simple code, in this blog I am going to show exactly why that happens.
The model I am going to use in the ContosoRetailDW provided by Microsoft and it contains 12.5 Million rows with a Snow Flake Scheme and I have created another Sales table in which I have de-normalized all the relevant columns from the dimension tables.
The model showing the Snow Flake Schema plus the SalesFat table that contains all the important column from Dates, Customer, Products, ProductSubcategory and ProductCategory de-normalized into one fact table:
We start by building 2 reports by dragging some random columns from both models:
Let's use DAX Studio to identify the time required in computing 2 simple calculations like the following ones and then lets measure the time spent in the computations
Here are the performance numbers:
I have removed ROLLUPADDISSUBTOTAL and ROLLUPGROUP which are used to compute the subtotals and grand totals, only to speed up the code.
The Snowflake Schema consumes somewhere around approximately 376ms in computation
De-normalized model is slightly faster which is expected as in case of a de-normalized model we don't pay the price of the relationships.
The Physical query plans generated by both are similar, on the other hand Logical Query Plan show slight difference and that is expected because a Star or Snowflake schema don't trigger the Auto Exist optimization but on the other hand a de-normalized model will rely heavily on Auto Exist for its results
Physical and Logical Query Plans of the Snow Flake Model:
Physical and Logical Query Plans of the De-normalized Model:
Both of the queries iterate 47,156 existing combinations and provides the results, so far everything is good but what happens if I add a 0 at the end of the [Total Sales] or [Total Sales Fat] measures? The results are surprising.
The performance has worsen by a 40 ms in the De-normalized model as now the DAX engine has decided to extract 2 data caches from the Storage Engine to compute the result:
And instead of computing the result using a single query like show earlier the previous Pyhsical Query Plan now Formula Engine does a Lookup between the 2 data caches:
Performance is extremely horrible in case of a Snow Flake Schema so I had to stop the computation after 90 seconds:
If you remember earlier the Snow Flake Schema model computed everything in a single query which is the query on the line 12 in the above image and before that query there are several other queries extracting the different columns from the different tables which are later used in a CrossApply operator.
Physical query plan is also pretty complex after adding 0:
If you zoom into the above image you will se a CrossApply operator which translates to CROSSJOIN in DAX language, so it looks like the reason for slow performance is this CrossApply because if you multiply 8 * 44 * 111 * 84 * 9 you will get 29,538,432 combinations and then there is a Lookup between these values and 47,156 values and to iterate these number of values the dax engine has to use a lot of CPU power which in turn means slow performance.
So how is that the Fat De-normalized table is much faster than the Snow Flake schema? The answer lies in the Auto Exists Optimization.
Auto Exists Optimization
Auto Exists is an optimization available in DAX that avoids useless calculations, so instead of computing result for non-existing combination of values it uses only the existing combinations and that greatly improves the performance.
So If I write a simple query like the following one for both models what do you think the result would be?
and
In these 2 cases the SUMMARIZECOLUMNS has 2 ways of answering the query, either it gets the distinct values from each column and then produce a CROSSJOIN of the columns or simply retrieving only the existing combinations.
For the Snow Flake Schema the DAX engine decides to compute the CROSSJOIN of both Color and Gender which is visible in the Pyhsical Query Plan with the CrossApply operator and the total number of rows returned are 48 ( 16 * 3 )
For the De-Normalized model the DAX engine computes only the existing combinations which are only 46
The point of Auto Exists is that even before your measure kicks in, the SUMMARIZECOLUMNS, which is the query measure returns only the existing combinations and on top of these combinations your measure has to compute the values, but when columns from different tables are used then Auto Exists optimization doesn't kicks in and because we are adding 0 it adds 0 to the non-existing combinations too which results in a huge number of rows iterated by the Formula Engine.
By adding 0 you force the CrossApply behaviour of SUMMARIZECOLUMNS, while this behaviour is not a problem in small reports but as soon as you start bringing 3 or more columns the total number of values iterated by Formula Engine can be huge.
Hence always pay attention to the number of rows that the Formula Engine has to iterate to answer a query.