Aggregations are speeding up the model. However, the aggregated table is not just one table, It can be multiple layers of aggregations. Aggregation by Date, aggregation by Date and Product, aggregation by Date and Product and Customer. Having multiple layers ensures that you always have the best performance result possible, and you only query the DirectQuery data source for the most atomic requests. Let’s see how this process is possible and helpful. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.
To learn more about aggregations read the article series here:
Our sample data model (which we built in the previous blog post), has already an aggregated table; Sales Agg, and one DirectQuery fact table; FactInternetSales.
In this example, we are going to add more aggregations to the model.
Second Aggregation Layer
Create the aggregation table
The second aggregation layer that I am going to add in this example is including Promotion also into the grouping options. Here is the group by settings that I have for my second aggregation table in Power Query. If you are interested to learn in details how the aggregated table can be created, read my other blog post about creating the aggregated table.
Set up relationships
Then load the table into Power BI, and create relationships to DimDate, DimCustomer, DimProductSubcategory, and DimPromotion.
Dual Storage Mode
Set the Storage mode of DimPromotion to Dual. It is very important to use the Dual storage mode. With this setup, Power BI for aggregation-based analysis will use the in-memory copy of the DimPromition, and for the atomic transaction levels, will use the DirectQuery version of it. Read my blog post here to learn all about the Dual storage mode.
After setting up relationships and storage modes, you need to Manage Aggregations on the Sales and Promotion Agg table (this is our second aggregation table);
Everything is very similar to the Aggregation setup we had before for the Sales Agg table. For this one, the only addition is the Group By action on PromotionKey.
When you have multiple layers of aggregation, you have to setup their precedence. The aggregated table that you want to take the highest priority, should have the highest precedence. In this case, for Sales Agg can be 1, and for Sales and Promotion Agg to be 0.
Testing the result
With the above setup, you can have visualizations that use promotions as well as Customer, Date, and ProductSubcategory and still be sourced from the aggregation. Here is an example;
This kind of visualization will not send any query to the DirectQuery source;
How about Precedence of Execution?
When you have multiple aggregation layers, it is important which one runs first. As an example, let’s assume some numbers. These are not real numbers, these are just numbers to help you understand the scenario. Let’s say the big fact table (FactInternetSales) which uses DirectQuery mode has 250 million rows. And the Sales Agg table has only 10,000 rows. But the Sales and Promotion Agg has 1,000,000 rows. In such scenario, when you are querying something that can be answered with the table with 10K rows (Sales Agg), you have to do so, because it would be definitely much faster than the table with 1M rows (Sales and Promotion Agg). Therefore, we need to set up the precedence of execution. Smaller aggregation tables should be the source of analysis first.
Query hits the First Aggregated Table in the Memory
Here is a visual in the Power BI Page using Gender (from DimCustomer), and SalesAmount (from FactInternetSales);
A visual like above will be querying not the FactInternetSales table, but the aggregated table behind the scene. In this case, it will query the Sales Agg table. here is behind the scene query sent to the Vertipaq engine (result generated from SQL Profiler);
In this example; query hits the first aggregated table in the memory;
Query hits the second aggregated table in the memory
Here is another visual that uses EnglishPromotionType (from DimPromotion), and SalesAmount (from FactInternetSales);
And the result this time cannot be fetched from the Sales Agg table (because DimPromotion is not there as the group by function), so it will be queried from the second aggregated table: Sales and Promotion Agg.
In this instance, query hits the second aggregated table in the memory;
Query hits the DirectQuery table in the source
As the last visual in this example, Visual below uses EnglishProductName (from DimProduct), and SalesAmount (from FactInternetSales);
and the result this time cannot be fetched from any of the aggregated tables, so it comes directly from the DirectQuery source table: FactInternetSales
In the last example; query hits the DirectQuery table in the data source:
Power BI is switching nicely between layers of aggregated tables without you noticing it. All above operations happening behind the scene. The user will just have the feeling that one table (FactInternetSales) is serving all queries and the query response time would be super fast (with the help of aggregations).
Aggregation can be implemented in multiple levels to even speed up the performance faster and faster. If there is a specific combination of dimension fields that used a lot in visualizations by users, that combination is a good candidate for an aggregation. Depends on the size of the aggregation table, the precedence of that should be followed. Please let me know in the comments below if you have any questions about aggregations.