Power BI Fast and Furious with Aggregations

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Power BI is not only a solution for small datasets, but It also caters for big datasets. If the volume of data is huge, you can switch to the DirectQuery mode for it. Because DirectQuery does not store a copy of that data in the memory of the machine that runs Power BI model, then Power BI will send queries back to the data source for every page render in the report. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.

DirectQuery mode, however, is slow. Consider a page with three, four, or five visuals. That page will send five queries to the data source with every change in the context of filtering, such as changing the slicer, clicking on a column chart to highlight part of it, or refreshing the page. The process of sending queries to the data source for every interaction makes the DirectQuery mode very slow. Because of that DirectQuery is the last recommended way of connection in Power BI. I have written a blog post about the comparison between connection modes in Power BI, and my suggestion is always to use DirectQuery as the last hope when other methods cannot be used.

Recently, a new mode of connection is announced in Power BI, called Composite mode. I have already written a blog post about it, which you can read here. Composite mode, allows part of the model to use DirectQuery (for large tables), and part of the model to use Import Data (for smaller tables). This way you can get better performance when you work with smaller tables because they are querying the in-memory structure of the data.

The table(s) that is part of the DirectQuery connection, however, is still slow. Composite mode comes with a fantastic feature called Aggregations. Aggregations are the solution to speed up the DirectQuery sourced tables within a Composite model. With aggregations, you can create layers of pre-aggregated values, which can be stored in the memory, and hence would perform faster.

How Does Aggregation work?

Imagine a Fact table with 250 millions of rows. Such a fact table is big enough to be considered as a good candidate for DirectQuery connection. You don’t want to load such a big table into the memory, and most probably, the Power BI file size exceeds the 1GB limitation. Now, think about your reporting solution for a second. Do you always query this fact table at the finest or minimum granular level? I mean do you always look at every single transaction in this table when you do report on it?

The answer is No. In most of the times, you are querying the data by other fields or columns. As an example; you query the Sales value in the fact table, by Year. Some other times, you query the fact table’s values by Customer’s education category. Some other times, you query the values in the fact table, by each product. When you look at real-world scenarios, most of the time, you are querying the fact table by aggregations of dimension tables.

The visual above is querying SalesAmount from the Fact table by the CalendarYear from DimDate;

Aggregated Table

So, if we are querying only by CalendarYear, then we can create an aggregated table. The aggregated table can pre-calculate the Sum of SalesAmount for every CalendarYear. In the case above, We just would have a table with five rows. One row for each year; from 2010 to 2014. Such aggregated table is so small (five rows only), that can be easily imported into the memory, and whenever we query that table, we would get a super-fast result.

By now, you may say; I would also query it by Quarter too. In fact, you may want to drill down into Month, week or even every single Day. That is absolutely fine. Considering 365 days in each year, you will have a table which has 5×365 rows, this would be a table including 1827 rows maximum (you may have one or two leap years in the period as well). The aggregated table size is still very tiny (only less than 2000 rows) compared to a fact table with 250 millions of rows. You can still import your aggregated table into the memory. Such a table will cover all data analysis you want by every single day.

Layers of Aggregation

You may need to create other aggregations by other dimensions too. Usually, because dimensions are much smaller than the fact table, your aggregated tables would be always smaller than the fact table. These aggregated tables are your layers of aggregation in the model.

The golden rule for a composite model in Power BI is; Do NOT use a lower level table, if there is an aggregation on top of it, which can be used for this query.

If you are querying even a DirectQuery table with 250 millions of rows, but you are only querying it by Date, then Power BI act differently. Power BI will not send a query to the data source of the fact table. It will query the aggregated table in the memory instead, and you get a fast response. Power BI only will switch to the table underneath, if aggregated tables cannot answer the question.

Doing it in Action

The overall process for having an aggregation is;

  • Creating the aggregation table

  • Creating relationships necessary for the aggregation table and the DirectQuery table

  • Setting the Storage mode of aggregated and Dimension tables

  • Configuring aggregation

I will write a blog post about each section separately in details. The next one in this series would be creating the aggregation table.

It is not fair to speak of Aggregations and doesn’t mention Christian Wade, Program Manager of Power BI Team who is the person in charge for this awesome work. Christian has a great demo of analyzing trillions of rows of data (a quarter of a petabyte data) in Power BI using aggregations in a instant response time and speed of clicky clicky, draggy, droppy. Watch his wonderful presentations of the Microsoft Business Applications Summit 2018 here;

Summary

Aggregation is a game changer feature in the performance and speed of Power BI Solutions when the data source table is huge. With the help of aggregations, you can have layers of pre-calculations, stored in memory, and ready to respond queries from the user instantly. The DirectQuery data source will be used only for the most atomic transaction queries. Stay tuned for the next blog post about How to create an aggregated table. Stay tuned for the rest of the blog series coming soon about Aggergations.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *