Power BI Aggregations Step 3: Configure Aggregation Functions and Test Aggregations in Action

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Aggregation can speed up the performance of DirectQuery sourced tables significantly. To use it, firstly you need to create an aggregation table, secondly, you need to set up proper storage modes for tables in the model. Finally, you have to configure the aggregation functions, which is what I’m going to explain in this post. You can learn about aggregation in other posts of this series, and continue the example here. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.

What is aggregation in Power BI?

Step 1: Create the aggregated table

Step 2: Dual Storage Mode

Sample Model

The sample data model that we have includes an aggregation table (Sales Agg), and a DirectQuery sourced fact table (FactInternetSales), and relationship to some other tables with Dual, or DirectQuery storage modes as below;

Manage Aggregations

The aggregated table is created and has proper relationships and storage mode. However, Power BI still is not aware of such aggregation (remember; because you can even create the aggregation outside of Power BI with the group by statement in T-SQL). You have to let Power BI know about such aggregation. That is why we need to set up and configure it.

Right click on the Sales Agg table, and select Manage Aggregations.

In the Manage Aggregations window, you will have the option to choose aggregation functions and fields that the function applied to it.

In this blog post, I won’t talk about the Precedence configuration. That is for scenarios when you have multiple layers of aggregation and the precedence of the aggregation usage. In this sample scenario, we just have one aggregated table.

The Aggregation configuration here should exactly match the aggregation configuration on the aggregated table. What does that mean? In the step of creating the aggregated table, we created a grouped by the result with below setup;

As you can see, we have three fields that are our Group by fields; OrderDateKey, CustomerKey, and ProductSubcategoryKey. These fields should be marked as Group By on their respective related field in the FactInternetSales table (the original fact table);

Then you should set up aggregation functions respective to what the original grouping configuration is. For SalesAmount_Sum, and UnitPrice_Sum, the aggregation should be set as Sum to their respective fields in the FactInternetSales table.

Rule #1: Detail Column Must be the Same Data Type of Grouped Column to use Sum Function

If you recall from my other blog post earlier, at the time of creating the aggregated table, I mentioned that if you are using SUM as aggregation, then the data type of the column after aggregation (SalesAmount_Sum), should be exactly the same data type as the original column (SalesAmount).

If you don’t follow this rule, then in the list of columns, the column will be greyed out, and you cannot select it. Change the data type first, and then come back to set it again.

For the other two columns UnitPrice_Count and FactInternetSales_Count, the aggregation is Count, and the table is FactInternetSales.

Rule #2: Data type of the column must be Whole Number (Integer) if you used Count Function

Again back to my other blog post earlier. If you created any aggregated result which uses Count, then the data type of that column should be the whole number. Otherwise, the Count function in the manage aggregations will be greyed out.

After configuring everything, the manage aggregations window should look like the below screenshot;

Hide Aggregated Table

The final step is to hide the aggregated table! You may think why?! Well, we created the aggregated table just for Power BI to automatically switch over from DirectQuery big fact table to the Import data small aggregated table. However, this is just for behind the scene. The user should know nothing about it. To make this experience seamless from the user point of view, just hide the Sales Agg table.

The user will see nothing about this table. The list of tables in the fields section contains only one Fact table; FactInternetSales.

Testing the result

After all the hard work from a few posts ago, now is the time to test how the aggregation works in action. Below report has all kind of visualizations all from FactInternetSales, SalesAmount sliced and diced by Education, and Occupation (from DimCustomer), CalendarYear (from DimDate), ProductSubcategory and ProductCategory;

I have SQL Profiler running all the time, and here is the result;

This is how the aggregated table is working behind the scene. Since all of these visuals are slicing and dicing the data by fields that we have in the aggregated table (CustomerKey, OrderDateKey, and ProductSubcategoryKey), the result will be always automatically fetched from the aggregated table. Even though the Sales Agg table is not actually used in these visualizations.

If, however, I have a visualization that uses a different field which is not in the aggregated table, then is the time to query from the data source. Visual below is looking at every single promotion;

And SQL Profiler tracks the query here:

So here you go; we have an aggregation that speeds up the performance of the DirectQuery sourced table. Nothing stops you from creating multiple levels of aggregation. In fact, it is recommended to do so, especially in scenarios that different combination of fields will be used by different users or visuals. I will write another blog post about adding more layers of aggregations to this.

Here is our model diagram for a reference;

Summary

In the article series of Aggregations, you learned what aggregations are and how it can make your composite model fast, then you learned how to create the aggregated table in Power BI. Dual storage mode was the next important setup that you learned about aggregations, and finally, in this step, you configured the aggregations’ functions and seen the aggregation in action. In real-world scenarios of using aggregations, you will have more than one aggregated table; called as layers of aggregations. Stay tuned for my next article about layers of aggregation.

Previous and Next Steps

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

10 Comments

  • Your posts are great! They are simple to understand, well written, focused, quick to read! Excellent as always! Many thanks!

  • Thanks Reza !

    I followed the all three articles on aggregation and you have done a great job in putting it simple and easy while converting all important aspects of it.

    Thanks again

  • Awesome blog series about aggregations Reza! I really enjoyed reading those detail articles. Looking forward to reading the next blog post about aggregation layers. Hopefully it will be published very soon 🙂
    Cheers
    Pavel

  • Hi Reza! Thanks for the great info about aggregations and how to leverage the Dual storage mode.

    I have a question about the data when it’s set up this way. How up-to-date would it be? I have a report that gets new data every 10 minutes, so I built it on direct query. It’s important that this report is always up-to-date as it’s used as a status tracker throughout the day. Would aggregations then limit me to the refresh schedules on the Power Bi service? Or does the refresh button rebuild the aggregations?

    • Hi.
      Yes, aggregation tables are going to be Import Data, and that means they will be filled with a scheduled refresh. and that schedule I don’t think can be defined at every 10 minutes. So there will be the problem of not having the most up-to-date data with aggregation tables (like any other import model scenarios)
      Cheers
      Reza

  • Hi Reza,
    Great series of articles.
    I have this working for a number of my dimensions but cannot get it to work with the date dimension.
    Unfortunately I dont have timekey (int) field on my dim_date, instead i am joining on the date field.
    Are you aware of issues of joining on date fields for this?
    David.

    • Hi David
      Thanks
      Unfortunately, the data type supported for Group By operations in Aggregation function is the whole number, you cannot use date there. However, you can easily create it from your date field. from the Add column tab, add a column for Year, Month, and Day, then add a prefix of “0” to month and day (to cover for one digit characters to have a leading “o”), and then take the Right two characters from the month and day column. in the end, Merge columns for Year, Month and Day. then convert it to the Whole number. that is then your datekey.
      Cheers
      Reza

Leave a Reply

Your email address will not be published. Required fields are marked *