Aggregation can speed up the performance of DirectQuery sourced tables significantly. To use it, firstly, you need to create an aggregation table. Secondly, you must set up proper storage modes for tables in the model. Finally, you have to configure the aggregation functions, which I will explain in this post. You can learn about aggregation in other posts of this series and continue the example here. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.

Video

Sample Model

The sample data model that we have includes an aggregation table (Sales Agg), a DirectQuery sourced fact table (FactInternetSales), and a relationship to some other tables with Dual or DirectQuery storage modes as below;

Manage Aggregations

The aggregated table is created and has proper relationships and storage modes. However, Power BI is unaware of such aggregation (remember, because you can even create the aggregation outside of Power BI with the group by clause in T-SQL). You have to let Power BI know about such aggregation. That is why we need to set up and configure it.

Right-click on the Sales Agg table and select Manage Aggregations.

In the Manage Aggregations window, you can choose aggregation functions and fields that the function applied to it.

In this blog post, I won’t talk about the Precedence configuration. That is for scenarios when you have multiple layers of aggregation and the precedence of the aggregation usage. In this sample scenario, we just have one aggregated table.

The Aggregation configuration should match the aggregation configuration on the aggregated table. What does that mean? In the step of creating the aggregated table, we created a group by the result with the below setup;

As you can see, we have three fields that are our Group by fields; OrderDateKey, CustomerKey, and ProductSubcategoryKey. These fields should be marked as Group By on their respective related field in the FactInternetSales table (the original fact table);

Then you should set up aggregation functions according to the original grouping configuration. For SalesAmount_Sum, and UnitPrice_Sum, the aggregation should be set as Sum to their respective fields in the FactInternetSales table.

Rule #1: Detail Column Must be the Same Data Type of Grouped Column to use Sum Function

If you recall from my other blog post earlier, at the time of creating the aggregated table, I mentioned that if you are using SUM as aggregation, then the data type of the column after aggregation (SalesAmount_Sum) should be exactly the same data type as the original column (SalesAmount).

If you don’t follow this rule, the column will be greyed out in the list of columns, and you cannot select it. Change the data type first, then come back to set it again.

For the other two columns, UnitPrice_Count and FactInternetSales_Count, the aggregation is Count, and the table is FactInternetSales.

Rule #2: The data type of the column must be Whole Number (Integer) if you used Count Function

Again back to my other blog post earlier. If you created any aggregated result using Count, then that column’s data type should be the whole number. Otherwise, the Count function in the manage aggregations will be greyed out.

After configuring everything, the manage aggregations window should look like the below screenshot;

Hide Aggregated Table

The final step is to hide the aggregated table! You may think, why?! We created the aggregated table for Power BI to automatically switch from the DirectQuery big fact table to the Import data small aggregated table. However, this is just behind the scene. The user should know nothing about it. Just hide the Sales Agg table to make this experience seamless from the user’s point of view. Power BI will automatically do this process in the recent version of Power BI Desktop.

The user will see nothing about this table. The list of tables in the fields section contains only one Fact table; FactInternetSales.

Testing the result

After all the hard work from a few posts ago, now is the time to test how the aggregation works in action. Below report has all kinds of visualizations all from FactInternetSales, SalesAmount sliced and diced by Education, and Occupation (from DimCustomer), CalendarYear (from DimDate), ProductSubcategory and ProductCategory;

I have SQL Profiler running all the time, and here is the result;

This is how the aggregated table works behind the scene. Since all of these visuals are slicing and dicing the data by fields in the aggregated table (CustomerKey, OrderDateKey, and ProductSubcategoryKey), the result will always be automatically fetched from the aggregated table. Even though the Sales Agg table is not used in these visualizations.

If, however, I have a visualization that uses a different field, not in the aggregated table, then it is time to query from the data source. The visual below is looking at every single promotion;

And SQL Profiler tracks the query here:

So here you go; we have an aggregation that speeds up the performance of the DirectQuery sourced table. Nothing stops you from creating multiple levels of aggregation. It is recommended to do so, especially in scenarios where a different combinations of fields will be used by different users or visuals. I will write another blog post about adding more layers of aggregations to this.

Here is our model diagram for reference;

Summary

This article taught you what aggregations are and how they can make your composite model fast. Then you learned how to create the aggregated table in Power BI. Dual storage mode was the next important setup you learned about aggregations. Finally, in this step, you configure the aggregations’ functions and see the aggregation in action. In real-world scenarios of using aggregations, you will have more than one aggregated table, called layers of aggregations. Stay tuned for my next article about layers of aggregation.

Previous and Next Steps

Reza Rad

Trainer, Consultant, Mentor

Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

12 thoughts on “Power BI Aggregations Step 3: Configure Aggregation Functions and Test Aggregations in Action”

Your posts are great! They are simple to understand, well written, focused, quick to read! Excellent as always! Many thanks!

Reza Rad says:

December 11, 2018 at 10:55 am

Thanks Daniel for your kind words!

Loading...

Reply

Thanks Reza !

I followed the all three articles on aggregation and you have done a great job in putting it simple and easy while converting all important aspects of it.

Thanks again

Reza Rad says:

December 11, 2018 at 9:52 am

Thanks for your kind words 🙂

Loading...

Reply

Awesome blog series about aggregations Reza! I really enjoyed reading those detail articles. Looking forward to reading the next blog post about aggregation layers. Hopefully it will be published very soon 🙂
Cheers
Pavel

Reza Rad says:

December 28, 2018 at 11:23 am

Thanks Pavel 🙂 Next posts already published 🙂

Loading...

Reply

Hi Reza! Thanks for the great info about aggregations and how to leverage the Dual storage mode.

I have a question about the data when it’s set up this way. How up-to-date would it be? I have a report that gets new data every 10 minutes, so I built it on direct query. It’s important that this report is always up-to-date as it’s used as a status tracker throughout the day. Would aggregations then limit me to the refresh schedules on the Power Bi service? Or does the refresh button rebuild the aggregations?

Reza Rad says:

January 21, 2019 at 7:32 am

Hi.
Yes, aggregation tables are going to be Import Data, and that means they will be filled with a scheduled refresh. and that schedule I don’t think can be defined at every 10 minutes. So there will be the problem of not having the most up-to-date data with aggregation tables (like any other import model scenarios)
Cheers
Reza

Loading...

Reply

Hi Reza,
Great series of articles.
I have this working for a number of my dimensions but cannot get it to work with the date dimension.
Unfortunately I dont have timekey (int) field on my dim_date, instead i am joining on the date field.
Are you aware of issues of joining on date fields for this?
David.

Reza Rad says:

March 7, 2019 at 11:13 am

Hi David
Thanks
Unfortunately, the data type supported for Group By operations in Aggregation function is the whole number, you cannot use date there. However, you can easily create it from your date field. from the Add column tab, add a column for Year, Month, and Day, then add a prefix of “0” to month and day (to cover for one digit characters to have a leading “o”), and then take the Right two characters from the month and day column. in the end, Merge columns for Year, Month and Day. then convert it to the Whole number. that is then your datekey.
Cheers
Reza

Loading...

Reply

thanks for sharing but I have a question.
If we create multiple aggregations do we have a size limit in powerbi? Because each aggregation tend to have more data and it can also become important
i have a set of 500 000 000 rows my final agg (with many keys) can contain 40 000 000 rows
I am doing correct?
thanks

Reza Rad says:

September 18, 2019 at 6:59 pm

There is no limitation on the number of rows in your agg tables, or the number of agg tables. but there is a limitation on the size of PBIX file (1GB for pro, 10GB for premium, 2GB for RS). as long as you don’t hit that limitation, you are good to go.
Cheers
Reza

Loading...

Reply

Daniel says:

December 7, 2018 at 8:46 am

Your posts are great! They are simple to understand, well written, focused, quick to read! Excellent as always! Many thanks!

Loading...

- Reza Rad says:
  
  December 11, 2018 at 10:55 am
  
  Thanks Daniel for your kind words!
  
  Loading...
  
Fowmy says:

December 8, 2018 at 12:43 am

Thanks Reza !

I followed the all three articles on aggregation and you have done a great job in putting it simple and easy while converting all important aspects of it.

Thanks again

Loading...

- Reza Rad says:
  
  December 11, 2018 at 9:52 am
  
  Thanks for your kind words 🙂
  
  Loading...
  
Pavel says:

December 22, 2018 at 6:28 am

Awesome blog series about aggregations Reza! I really enjoyed reading those detail articles. Looking forward to reading the next blog post about aggregation layers. Hopefully it will be published very soon 🙂
Cheers
Pavel

Loading...

- Reza Rad says:
  
  December 28, 2018 at 11:23 am
  
  Thanks Pavel 🙂 Next posts already published 🙂
  
  Loading...
  
Shredd says:

January 19, 2019 at 9:23 am

Hi Reza! Thanks for the great info about aggregations and how to leverage the Dual storage mode.

I have a question about the data when it’s set up this way. How up-to-date would it be? I have a report that gets new data every 10 minutes, so I built it on direct query. It’s important that this report is always up-to-date as it’s used as a status tracker throughout the day. Would aggregations then limit me to the refresh schedules on the Power Bi service? Or does the refresh button rebuild the aggregations?

Loading...

- Reza Rad says:
  
  January 21, 2019 at 7:32 am
  
  Hi.
  Yes, aggregation tables are going to be Import Data, and that means they will be filled with a scheduled refresh. and that schedule I don’t think can be defined at every 10 minutes. So there will be the problem of not having the most up-to-date data with aggregation tables (like any other import model scenarios)
  Cheers
  Reza
  
  Loading...
  
David says:

March 7, 2019 at 3:40 am

Hi Reza,
Great series of articles.
I have this working for a number of my dimensions but cannot get it to work with the date dimension.
Unfortunately I dont have timekey (int) field on my dim_date, instead i am joining on the date field.
Are you aware of issues of joining on date fields for this?
David.

Loading...

- Reza Rad says:
  
  March 7, 2019 at 11:13 am
  
  Hi David
  Thanks
  Unfortunately, the data type supported for Group By operations in Aggregation function is the whole number, you cannot use date there. However, you can easily create it from your date field. from the Add column tab, add a column for Year, Month, and Day, then add a prefix of “0” to month and day (to cover for one digit characters to have a leading “o”), and then take the Right two characters from the month and day column. in the end, Merge columns for Year, Month and Day. then convert it to the Whole number. that is then your datekey.
  Cheers
  Reza
  
  Loading...
  
E.Leite says:

September 8, 2019 at 6:12 pm

thanks for sharing but I have a question.
If we create multiple aggregations do we have a size limit in powerbi? Because each aggregation tend to have more data and it can also become important
i have a set of 500 000 000 rows my final agg (with many keys) can contain 40 000 000 rows
I am doing correct?
thanks

Loading...

- Reza Rad says:
  
  September 18, 2019 at 6:59 pm
  
  There is no limitation on the number of rows in your agg tables, or the number of agg tables. but there is a limitation on the size of PBIX file (1GB for pro, 10GB for premium, 2GB for RS). as long as you don’t hit that limitation, you are good to go.
  Cheers
  Reza
  
  Loading...