Aggregation tables are the fast performing solution for huge DirectQuery tables in Power BI. In the previous blog post, I explained what is an aggregation, and why it is an important part of a Power BI implementation. Aggregations are part of the Composite model in the Power BI. For the aggregation set up, your first step is to create an aggregated table. In this blog post, I’ll explain how that step can be done. If you want to learn more about Power BI, read the Power BI book, from Rookie to Rock Star.
If you want to follow the example scenario here, you would need a SQL Server database named AdventureWorksDW. I made some changes in my dataset to be a bit bigger for the size of its fact table, so I can show you the functionality of aggregations. You can download the database link from here:
In this sample model, we will be analyzing the data of FactInternetSales (which we assume is our big fact table). Create a Power BI report with the DirectQuery connection to the SQL Server;
With the DirectQuery option, select these tables:
FactInternetSales, DimCustomer, DimDate, DimProduct, DimProductCategory, DimProductSubCategory, DimPromotion, and DimGeography.
Tables loaded into Power BI, will have relationships together, I just limited the relationship between DimDate and FactInternetSales table to one active relationship based on DateKey (in DimDate), and OrderDateKey (in the FactInternetSales). This is the diagram of relationships.
What is Aggregated Table
The aggregated table is a table in Power BI aggregated by one or more fields from the DirectQuery source table. In our case, Aggregated table, would be a grouped table by specific fields from the FactInternetSales table. The aggregated table can be created by all possible methods. You can create the aggregated table with T-SQL statements from SQL Server. Or you can create it in Power Query. You can create it in all other data transformation tools and query languages possible. Since we are doing everything with Power BI here, I’ll create the aggregated table with Power Query.
The aggregated table can be created in the data source with T-SQL queries, or in Power Query, or anywhere else that you can create a grouped table.
Creating the Aggregated Table
Go to Power Query Editor, and select the FactInternetSales table, the aggregated table that we are going to create in this example, is going to be on three fields; OrderDateKey, CustomerKey, and ProductSubCategoryKey. The first two fields Exist in the FactInternetSales, but not the third one. However, using relationship columns, we can retrieve that.
Scroll to the right in the FactInternetSales table columns to find Product, then click on Expand. In the Expand options, just Select ProductSubCategoryKey.
Now that you got the ProductSubCategoryKey in the table, you can apply the Group By. But if we use Group By in the existing query, we are going to change it to the aggregated table. We do, however, need to keep the existing intact. So create a REFERENCE from the existing FactInternetSales table (To learn more about Reference and the difference of that with Duplicate, read my article here). Name it as Sales Agg, and let’s do the group by on that. In the Transformation Tab, click on Group By;
When the Group By dialog box pops up, choose Advanced
Choose the three fields of OrderDateKey, CustomerKey, and ProductSubcategoryKey in the Group By fields
Then add four aggregations as below;
After building this table, let’s call it Sales Agg table. Here is a screenshot of the Sales Agg table.
Key Consideration for upcoming steps
Aggregation columns in the aggregated table should follow specific rules.
Rule #1: Exact Match for Data Types of Aggregations by Sum, Min, Max, Average
Columns that you apply an operation such as Sum, Average, Min, or Max, should have exactly same data type as the original source column after the aggregation. If they don’t, make sure you change the data types to have both the same.
SalesAmount field in the FactInternetSales has Decimal data type (if not, change it to that)
SalesAmount_Sum column in the Sales Agg table also should have the same data type. In this case Decimal.
Note that having the Decimal data type is not an important part of the rule. It is important to have both data types exactly matching.
This process, in our example, should be completed for the two columns of SalesAmount_Sum and UnitPrice_Sum.
Rule #2: Whole Number data type is mandatory for aggregations by Count
Any aggregations that use Count as the aggregation function, should have the data type of the whole number or let’s say Integer. In our sample table, we have two columns with Count function:
You should make sure that these two columns have a Whole Number data type after Group by transformation;
The two rules mentioned above will be important in the next steps of aggregation blog posts when we are doing the configuration of aggregation in Power BI.
Aggregated Table is an Import Table
The aggregated table that we have created is named Sales Agg. Because this table is much smaller than the FactInternetSales table, it can be stored in the memory. This way, we get the best performance of query response time, when we query something in the aggregated level.
Your Data model in Power BI should have tables below in it now after completing this step:
Congratulations, You have built the aggregated table, and now you can create relationships between this table and the three dimension tables: DimCustomer, DimDate, and DimProductSubcategory. Here is the full relationship Diagram;
In the next step, I will explain what is the storage mode, and how important is the Dual storage model configuration when we use the aggregations.
If you like to learn what is the Aggregation, and how it can be useful, read the previous post here.