Multiple Layers of Aggregations in Power BI; Model Responds Even Faster

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Aggregations are speeding up the model. However, the aggregated table is not just one table, It can be multiple layers of aggregations. Aggregation by Date, aggregation by Date and Product, aggregation by Date and Product and Customer. Having multiple layers ensures that you always have the best performance result possible, and you only query the DirectQuery data source for the most atomic requests. Let’s see how this process is possible and helpful. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.

To learn more about aggregations read the article series here:

What is aggregation in Power BI?

Step 1: Create the aggregated table

Step 2: Dual Storage Mode

Step 3: Configure Aggregation Functions

Sample model

Our sample data model (which we built in the previous blog post), has already an aggregated table; Sales Agg, and one DirectQuery fact table; FactInternetSales.

In this example, we are going to add more aggregations to the model.

Second Aggregation Layer

Create the aggregation table

The second aggregation layer that I am going to add in this example is including Promotion also into the grouping options. Here is the group by settings that I have for my second aggregation table in Power Query. If you are interested to learn in details how the aggregated table can be created, read my other blog post about creating the aggregated table.

Set up relationships

Then load the table into Power BI, and create relationships to DimDate, DimCustomer, DimProductSubcategory, and DimPromotion.

Dual Storage Mode

Set the Storage mode of DimPromotion to Dual. It is very important to use the Dual storage mode. With this setup, Power BI for aggregation-based analysis will use the in-memory copy of the DimPromition, and for the atomic transaction levels, will use the DirectQuery version of it. Read my blog post here to learn all about the Dual storage mode.

Manage Aggregations

After setting up relationships and storage modes, you need to Manage Aggregations on the Sales and Promotion Agg table (this is our second aggregation table);

Everything is very similar to the Aggregation setup we had before for the Sales Agg table. For this one, the only addition is the Group By action on PromotionKey.

Precedence Setup

When you have multiple layers of aggregation, you have to setup their precedence. The aggregated table that you want to take the highest priority, should have the highest precedence. In this case, for Sales Agg can be 1, and for Sales and Promotion Agg to be 0.

Testing the result

With the above setup, you can have visualizations that use promotions as well as Customer, Date, and ProductSubcategory and still be sourced from the aggregation. Here is an example;

This kind of visualization will not send any query to the DirectQuery source;

How about Precedence of Execution?

When you have multiple aggregation layers, it is important which one runs first. As an example, let’s assume some numbers. These are not real numbers, these are just numbers to help you understand the scenario. Let’s say the big fact table (FactInternetSales) which uses DirectQuery mode has 250 million rows. And the Sales Agg table has only 10,000 rows. But the Sales and Promotion Agg has 1,000,000 rows. In such scenario, when you are querying something that can be answered with the table with 10K rows (Sales Agg), you have to do so, because it would be definitely much faster than the table with 1M rows (Sales and Promotion Agg). Therefore, we need to set up the precedence of execution. Smaller aggregation tables should be the source of analysis first.

Query hits the First Aggregated Table in the Memory

Here is a visual in the Power BI Page using Gender (from DimCustomer), and SalesAmount (from FactInternetSales);

A visual like above will be querying not the FactInternetSales table, but the aggregated table behind the scene. In this case, it will query the Sales Agg table. here is behind the scene query sent to the Vertipaq engine (result generated from SQL Profiler);

In this example; query hits the first aggregated table in the memory;

Query hits the second aggregated table in the memory

Here is another visual that uses EnglishPromotionType (from DimPromotion), and SalesAmount (from FactInternetSales);

And the result this time cannot be fetched from the Sales Agg table (because DimPromotion is not there as the group by function), so it will be queried from the second aggregated table: Sales and Promotion Agg.

In this instance, query hits the second aggregated table in the memory;

Query hits the DirectQuery table in the source

As the last visual in this example, Visual below uses EnglishProductName (from DimProduct), and SalesAmount (from FactInternetSales);

and the result this time cannot be fetched from any of the aggregated tables, so it comes directly from the DirectQuery source table: FactInternetSales

In the last example; query hits the DirectQuery table in the data source:

Power BI is switching nicely between layers of aggregated tables without you noticing it. All the above operations happening behind the scene. The user will just have the feeling that one table (FactInternetSales) is serving all queries and the query response time would be super fast (with the help of aggregations).

Summary

Aggregation can be implemented in multiple levels to even speed up the performance faster and faster. If there is a specific combination of dimension fields that used a lot in visualizations by users, that combination is a good candidate for aggregation. Depends on the size of the aggregation table, the precedence of that should be followed. Please let me know in the comments below if you have any questions about aggregations.

Previous Steps

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply

Your email address will not be published. Required fields are marked *