Aggregation can speed up the performance of DirectQuery sourced tables significantly. To use it, firstly, you need to create an aggregation table. Secondly, you must set up proper storage modes for tables in the model. Finally, you have to configure the aggregation functions, which I will explain in this post. You can learn about aggregation in other posts of this series and continue the example here. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.
The sample data model that we have includes an aggregation table (Sales Agg), a DirectQuery sourced fact table (FactInternetSales), and a relationship to some other tables with Dual or DirectQuery storage modes as below;
The aggregated table is created and has proper relationships and storage modes. However, Power BI is unaware of such aggregation (remember, because you can even create the aggregation outside of Power BI with the group by clause in T-SQL). You have to let Power BI know about such aggregation. That is why we need to set up and configure it.
Right-click on the Sales Agg table and select Manage Aggregations.
In the Manage Aggregations window, you can choose aggregation functions and fields that the function applied to it.
In this blog post, I won’t talk about the Precedence configuration. That is for scenarios when you have multiple layers of aggregation and the precedence of the aggregation usage. In this sample scenario, we just have one aggregated table.
The Aggregation configuration should match the aggregation configuration on the aggregated table. What does that mean? In the step of creating the aggregated table, we created a group by the result with the below setup;
As you can see, we have three fields that are our Group by fields; OrderDateKey, CustomerKey, and ProductSubcategoryKey. These fields should be marked as Group By on their respective related field in the FactInternetSales table (the original fact table);
Then you should set up aggregation functions according to the original grouping configuration. For SalesAmount_Sum, and UnitPrice_Sum, the aggregation should be set as Sum to their respective fields in the FactInternetSales table.
Rule #1: Detail Column Must be the Same Data Type of Grouped Column to use Sum Function
If you recall from my other blog post earlier, at the time of creating the aggregated table, I mentioned that if you are using SUM as aggregation, then the data type of the column after aggregation (SalesAmount_Sum) should be exactly the same data type as the original column (SalesAmount).
If you don’t follow this rule, the column will be greyed out in the list of columns, and you cannot select it. Change the data type first, then come back to set it again.
For the other two columns, UnitPrice_Count and FactInternetSales_Count, the aggregation is Count, and the table is FactInternetSales.
Rule #2: The data type of the column must be Whole Number (Integer) if you used Count Function
Again back to my other blog post earlier. If you created any aggregated result using Count, then that column’s data type should be the whole number. Otherwise, the Count function in the manage aggregations will be greyed out.
After configuring everything, the manage aggregations window should look like the below screenshot;
Hide Aggregated Table
The final step is to hide the aggregated table! You may think, why?! We created the aggregated table for Power BI to automatically switch from the DirectQuery big fact table to the Import data small aggregated table. However, this is just behind the scene. The user should know nothing about it. Just hide the Sales Agg table to make this experience seamless from the user’s point of view. Power BI will automatically do this process in the recent version of Power BI Desktop.
The user will see nothing about this table. The list of tables in the fields section contains only one Fact table; FactInternetSales.
Testing the result
After all the hard work from a few posts ago, now is the time to test how the aggregation works in action. Below report has all kinds of visualizations all from FactInternetSales, SalesAmount sliced and diced by Education, and Occupation (from DimCustomer), CalendarYear (from DimDate), ProductSubcategory and ProductCategory;
I have SQL Profiler running all the time, and here is the result;
This is how the aggregated table works behind the scene. Since all of these visuals are slicing and dicing the data by fields in the aggregated table (CustomerKey, OrderDateKey, and ProductSubcategoryKey), the result will always be automatically fetched from the aggregated table. Even though the Sales Agg table is not used in these visualizations.
If, however, I have a visualization that uses a different field, not in the aggregated table, then it is time to query from the data source. The visual below is looking at every single promotion;
And SQL Profiler tracks the query here:
So here you go; we have an aggregation that speeds up the performance of the DirectQuery sourced table. Nothing stops you from creating multiple levels of aggregation. It is recommended to do so, especially in scenarios where a different combinations of fields will be used by different users or visuals. I will write another blog post about adding more layers of aggregations to this.
Here is our model diagram for reference;
This article taught you what aggregations are and how they can make your composite model fast. Then you learned how to create the aggregated table in Power BI. Dual storage mode was the next important setup you learned about aggregations. Finally, in this step, you configure the aggregations’ functions and see the aggregation in action. In real-world scenarios of using aggregations, you will have more than one aggregated table, called layers of aggregations. Stay tuned for my next article about layers of aggregation.
Previous and Next Steps
- Power BI Fast and Furious with Aggregations
- Power BI Aggregation: Step 1 Create the Aggregated Table
- Dual Storage Mode; The Most Important Configuration for Aggregations! Step 2 Power BI Aggregations
- Power BI Aggregations Step 3: Configure Aggregation Functions and Test Aggregations in Action (The current article)
- Multiple Layers of Aggregations in Power BI; Model Responds Even Faster