Aggregation can speed up the performance of DirectQuery sourced tables significantly. To use it, firstly you need to create an aggregation table, secondly, you need to set up proper storage modes for tables in the model. Finally, you have to configure the aggregation functions, which is what I’m going to explain in this post. You can learn about aggregation in other posts of this series, and continue the example here. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.
The sample data model that we have includes an aggregation table (Sales Agg), and a DirectQuery sourced fact table (FactInternetSales), and relationship to some other tables with Dual, or DirectQuery storage modes as below;
The aggregated table is created and has proper relationships and storage mode. However, Power BI still is not aware of such aggregation (remember; because you can even create the aggregation outside of Power BI with the group by statement in T-SQL). You have to let Power BI know about such aggregation. That is why we need to set up and configure it.
Right click on the Sales Agg table, and select Manage Aggregations.
In the Manage Aggregations window, you will have the option to choose aggregation functions and fields that the function applied to it.
In this blog post, I won’t talk about the Precedence configuration. That is for scenarios when you have multiple layers of aggregation and the precedence of the aggregation usage. In this sample scenario, we just have one aggregated table.
The Aggregation configuration here should exactly match the aggregation configuration on the aggregated table. What does that mean? In the step of creating the aggregated table, we created a grouped by the result with below setup;
As you can see, we have three fields that are our Group by fields; OrderDateKey, CustomerKey, and ProductSubcategoryKey. These fields should be marked as Group By on their respective related field in the FactInternetSales table (the original fact table);
Then you should set up aggregation functions respective to what the original grouping configuration is. For SalesAmount_Sum, and UnitPrice_Sum, the aggregation should be set as Sum to their respective fields in the FactInternetSales table.
Rule #1: Detail Column Must be the Same Data Type of Grouped Column to use Sum Function
If you recall from my other blog post earlier, at the time of creating the aggregated table, I mentioned that if you are using SUM as aggregation, then the data type of the column after aggregation (SalesAmount_Sum), should be exactly the same data type as the original column (SalesAmount).
If you don’t follow this rule, then in the list of columns, the column will be greyed out, and you cannot select it. Change the data type first, and then come back to set it again.
For the other two columns UnitPrice_Count and FactInternetSales_Count, the aggregation is Count, and the table is FactInternetSales.
Rule #2: Data type of the column must be Whole Number (Integer) if you used Count Function
Again back to my other blog post earlier. If you created any aggregated result which uses Count, then the data type of that column should be the whole number. Otherwise, the Count function in the manage aggregations will be greyed out.
After configuring everything, the manage aggregations window should look like the below screenshot;
Hide Aggregated Table
The final step is to hide the aggregated table! You may think why?! Well, we created the aggregated table just for Power BI to automatically switch over from DirectQuery big fact table to the Import data small aggregated table. However, this is just for behind the scene. The user should know nothing about it. To make this experience seamless from the user point of view, just hide the Sales Agg table.
The user will see nothing about this table. The list of tables in the fields section contains only one Fact table; FactInternetSales.
Testing the result
After all the hard work from a few posts ago, now is the time to test how the aggregation works in action. Below report has all kind of visualizations all from FactInternetSales, SalesAmount sliced and diced by Education, and Occupation (from DimCustomer), CalendarYear (from DimDate), ProductSubcategory and ProductCategory;
I have SQL Profiler running all the time, and here is the result;
This is how the aggregated table is working behind the scene. Since all of these visuals are slicing and dicing the data by fields that we have in the aggregated table (CustomerKey, OrderDateKey, and ProductSubcategoryKey), the result will be always automatically fetched from the aggregated table. Even though the Sales Agg table is not actually used in these visualizations.
If, however, I have a visualization that uses a different field which is not in the aggregated table, then is the time to query from the data source. Visual below is looking at every single promotion;
And SQL Profiler tracks the query here:
So here you go; we have an aggregation that speeds up the performance of the DirectQuery sourced table. Nothing stops you from creating multiple levels of aggregation. In fact, it is recommended to do so, especially in scenarios that different combination of fields will be used by different users or visuals. I will write another blog post about adding more layers of aggregations to this.
Here is our model diagram for a reference;
In the article series of Aggregations, you learned what aggregations are and how it can make your composite model fast, then you learned how to create the aggregated table in Power BI. Dual storage mode was the next important setup that you learned about aggregations, and finally, in this step, you configured the aggregations’ functions and seen the aggregation in action. In real-world scenarios of using aggregations, you will have more than one aggregated table; called as layers of aggregations. Stay tuned for my next article about layers of aggregation.