Power BI Aggregation: Step 1 Create the Aggregated Table

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Aggregation tables are the fast performing solution for huge DirectQuery tables in Power BI. In the previous blog post, I explained what is an aggregation, and why it is an important part of a Power BI implementation. Aggregations are part of the Composite model in the Power BI. For the aggregation set up, your first step is to create an aggregated table. In this blog post, I’ll explain how that step can be done. If you want to learn more about Power BI, read the Power BI book, from Rookie to Rock Star.

Sample dataset

If you want to follow the example scenario here, you would need a SQL Server database named AdventureWorksDW. I made some changes in my dataset to be a bit bigger for the size of its fact table, so I can show you the functionality of aggregations. You can download the database link from here:

radacad.com/files/adventureworksdw2012.bak

Sample model

In this sample model, we will be analyzing the data of FactInternetSales (which we assume is our big fact table). Create a Power BI report with the DirectQuery connection to the SQL Server;

With the DirectQuery option, select these tables:

FactInternetSales, DimCustomer, DimDate, DimProduct, DimProductCategory, DimProductSubCategory, DimPromotion, and DimGeography.

Tables loaded into Power BI, will have relationships together, I just limited the relationship between DimDate and FactInternetSales table to one active relationship based on DateKey (in DimDate), and OrderDateKey (in the FactInternetSales). This is the diagram of relationships.

What is Aggregated Table

The aggregated table is a table in Power BI aggregated by one or more fields from the DirectQuery source table. In our case, Aggregated table, would be a grouped table by specific fields from the FactInternetSales table. The aggregated table can be created by all possible methods. You can create the aggregated table with T-SQL statements from SQL Server. Or you can create it in Power Query. You can create it in all other data transformation tools and query languages possible. Since we are doing everything with Power BI here, I’ll create the aggregated table with Power Query.

The aggregated table can be created in the data source with T-SQL queries, or in Power Query, or anywhere else that you can create a grouped table.

Creating the Aggregated Table

Go to Power Query Editor, and select the FactInternetSales table, the aggregated table that we are going to create in this example, is going to be on three fields; OrderDateKey, CustomerKey, and ProductSubCategoryKey. The first two fields Exist in the FactInternetSales, but not the third one. However, using relationship columns, we can retrieve that.

Scroll to the right in the FactInternetSales table columns to find Product, then click on Expand. In the Expand options, just Select ProductSubCategoryKey.

Now that you got the ProductSubCategoryKey in the table, you can apply the Group By. But if we use Group By in the existing query, we are going to change it to the aggregated table. We do, however, need to keep the existing intact. So create a REFERENCE from the existing FactInternetSales table (To learn more about Reference and the difference of that with Duplicate, read my article here). Name it as Sales Agg, and let’s do the group by on that. In the Transformation Tab, click on Group By;

When the Group By dialog box pops up, choose Advanced

 

Choose the three fields of OrderDateKey, CustomerKey, and ProductSubcategoryKey in the Group By fields

Then add four aggregations as below;

After building this table, let’s call it Sales Agg table. Here is a screenshot of the Sales Agg table.

Key Consideration for upcoming steps

Aggregation columns in the aggregated table should follow specific rules.

Rule #1: Exact Match for Data Types of Aggregations by Sum, Min, Max, Average

Columns that you apply an operation such as Sum, Average, Min, or Max, should have exactly same data type as the original source column after the aggregation. If they don’t, make sure you change the data types to have both the same.

SalesAmount field in the FactInternetSales has Decimal data type (if not, change it to that)

SalesAmount_Sum column in the Sales Agg table also should have the same data type. In this case Decimal.

Note that having the Decimal data type is not an important part of the rule. It is important to have both data types exactly matching.

This process, in our example, should be completed for the two columns of SalesAmount_Sum and UnitPrice_Sum.

Rule #2: Whole Number data type is mandatory for aggregations by Count

Any aggregations that use Count as the aggregation function, should have the data type of the whole number or let’s say Integer. In our sample table, we have two columns with Count function:

You should make sure that these two columns have a Whole Number data type after Group by transformation;

The two rules mentioned above will be important in the next steps of aggregation blog posts when we are doing the configuration of aggregation in Power BI.

Aggregated Table is an Import Table

The aggregated table that we have created is named Sales Agg. Because this table is much smaller than the FactInternetSales table, it can be stored in the memory. This way, we get the best performance of query response time, when we query something in the aggregated level.

Your Data model in Power BI should have tables below in it now after completing this step:

Congratulations, You have built the aggregated table, and now you can create relationships between this table and the three dimension tables: DimCustomer, DimDate, and DimProductSubcategory. Here is the full relationship Diagram;

In the next step, I will explain what is the storage mode, and how important is the Dual storage model configuration when we use the aggregations.

If you like to learn what is the Aggregation, and how it can be useful, read the previous post here.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

6 Comments

  • Hi Reza,
    I follow your instructions and I have one question
    The process did not generate new table (Sales Agg)
    It convert it to a smaller table with specified columns you picked
    I had to duplicate the fact table in order to create aggregation table
    Do I miss something?
    Also when you try to convert the unitPrice_Count and SalesAmount_Count
    it asked me if I want to change all table to import mode?

    Thanks

    • Hi Oded
      You are right. there was a step missing in my explanation. You have to create a REFERENCE from FactInternetSales table and name it as Sales Agg. I added that step in the blog post. let me know if you have any other questions
      Cheers
      Reza

  • Hi Reza!

    When I finished the steps, PBI told me “This step results in a query that is not supported in Directquery mode.” and presents a button labeled “Toggle All tables to import mode.”

    What I should do?

    Thanks

    Rolando

  • I haven’t tried this yet, but does the aggregated table benefit from query folding to ensure it’s creation is efficient? And what if additional query steps were added to the Fact table query? My thinking is all about whether the creation of the aggregation table is efficient (or when can it be inefficient) in PQ.

    • Hi Matt
      The aggregated table can be created anywhere. Using T-SQL, or in Power Query, or even in DAX. Depends on the main table’s size and Group By parameters, you may decide to do it in Power Query or in T-SQL.

Leave a Reply

Your email address will not be published. Required fields are marked *