Power BI Fast and Furious with Aggregations

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

Power BI is not only a solution for small datasets, but It also caters for big datasets. If the volume of data is huge, you can switch to the DirectQuery mode for it. Because DirectQuery does not store a copy of that data in the memory of the machine that runs Power BI model, then Power BI will send queries back to the data source for every page render in the report. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.

DirectQuery mode, however, is slow. Consider a page with three, four, or five visuals. That page will send five queries to the data source with every change in the context of filtering, such as changing the slicer, clicking on a column chart to highlight part of it, or refreshing the page. The process of sending queries to the data source for every interaction makes the DirectQuery mode very slow. Because of that DirectQuery is the last recommended way of connection in Power BI. I have written a blog post about the comparison between connection modes in Power BI, and my suggestion is always to use DirectQuery as the last hope when other methods cannot be used.

Recently, a new mode of connection is announced in Power BI, called Composite mode. I have already written a blog post about it, which you can read here. Composite mode, allows part of the model to use DirectQuery (for large tables), and part of the model to use Import Data (for smaller tables). This way you can get better performance when you work with smaller tables because they are querying the in-memory structure of the data.

The table(s) that is part of the DirectQuery connection, however, is still slow. Composite mode comes with a fantastic feature called Aggregations. Aggregations are the solution to speed up the DirectQuery sourced tables within a Composite model. With aggregations, you can create layers of pre-aggregated values, which can be stored in the memory, and hence would perform faster.

How Does Aggregation work?

Imagine a Fact table with 250 millions of rows. Such a fact table is big enough to be considered as a good candidate for DirectQuery connection. You don’t want to load such a big table into the memory, and most probably, the Power BI file size exceeds the 1GB limitation. Now, think about your reporting solution for a second. Do you always query this fact table at the finest or minimum granular level? I mean do you always look at every single transaction in this table when you do report on it?

The answer is No. In most of the times, you are querying the data by other fields or columns. As an example; you query the Sales value in the fact table, by Year. Some other times, you query the fact table’s values by Customer’s education category. Some other times, you query the values in the fact table, by each product. When you look at real-world scenarios, most of the time, you are querying the fact table by aggregations of dimension tables.

The visual above is querying SalesAmount from the Fact table by the CalendarYear from DimDate;

Aggregated Table

So, if we are querying only by CalendarYear, then we can create an aggregated table. The aggregated table can pre-calculate the Sum of SalesAmount for every CalendarYear. In the case above, We just would have a table with five rows. One row for each year; from 2010 to 2014. Such aggregated table is so small (five rows only), that can be easily imported into the memory, and whenever we query that table, we would get a super-fast result.

By now, you may say; I would also query it by Quarter too. In fact, you may want to drill down into Month, week or even every single Day. That is absolutely fine. Considering 365 days in each year, you will have a table which has 5×365 rows, this would be a table including 1827 rows maximum (you may have one or two leap years in the period as well). The aggregated table size is still very tiny (only less than 2000 rows) compared to a fact table with 250 millions of rows. You can still import your aggregated table into the memory. Such a table will cover all data analysis you want by every single day.

Layers of Aggregation

You may need to create other aggregations by other dimensions too. Usually, because dimensions are much smaller than the fact table, your aggregated tables would be always smaller than the fact table. These aggregated tables are your layers of aggregation in the model.

The golden rule for a composite model in Power BI is; Do NOT use a lower level table, if there is an aggregation on top of it, which can be used for this query.

If you are querying even a DirectQuery table with 250 millions of rows, but you are only querying it by Date, then Power BI act differently. Power BI will not send a query to the data source of the fact table. It will query the aggregated table in the memory instead, and you get a fast response. Power BI only will switch to the table underneath, if aggregated tables cannot answer the question.

Doing it in Action

The overall process for having an aggregation is;

  • Creating the aggregation table

  • Creating relationships necessary for the aggregation table and the DirectQuery table

  • Setting the Storage mode of aggregated and Dimension tables

  • Configuring aggregation

I will write a blog post about each section separately in details. The next one in this series would be creating the aggregation table.

It is not fair to speak of Aggregations and doesn’t mention Christian Wade, Program Manager of Power BI Team who is the person in charge for this awesome work. Christian has a great demo of analyzing trillions of rows of data (a quarter of a petabyte data) in Power BI using aggregations in a instant response time and speed of clicky clicky, draggy, droppy. Watch his wonderful presentations of the Microsoft Business Applications Summit 2018 here;

Summary

Aggregation is a game changer feature in the performance and speed of Power BI Solutions when the data source table is huge. With the help of aggregations, you can have layers of pre-calculations, stored in memory, and ready to respond to queries from the user instantly. The DirectQuery data source will be used only for the most atomic transaction queries. Stay tuned for the next blog post about How to create an aggregated table. Read the rest of the blog series below Aggregations.

Next Steps

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

12 thoughts on “Power BI Fast and Furious with Aggregations

  • Hi Reza,
    How do you actually create the Aggregation table?
    All examples start with Aggregation table already exists
    Thanks,
    Oded Dror

  • Hi there. One problem I see with this (but might be that I just don’t have enough knowledge about the feature yet) is that while the big fat fact table in the database is being constantly updated and therefore you get the latest data through native queries, it’s not so with the aggregated in-mem tables which, by the way, can get pretty quickly out of sync with the main DirectQuery table. Am I missing anything here? Thanks.

    • That would only happen if someone goes to every single record in your fact table, get’s the value of that table, and then compare it with the grand total. because grand total comes from the aggregation table, and the row details from the fact table. and that normally isn’t something users would do for a huge fact table.
      Cheers
      Reza

  • Hi Reza! I was at your talk today in Copenhagen (I asked the question about many-to-many, which I’m testing now!). I was thinking of another use case, but I’m not sure if it will work. Say perhaps that the users need to access detailed information for recent data, but aggregated for older data (for example order lines for the last 12 months, but only aggregated per month for older orders). Do you think this could be solved with aggregations? Or would we need to use a different approach. In my head I can think of two solutions:
    1) that we have a single table, which is both aggregated and detailed. Historical data would be aggregated to a single date per month, with null values for order number etc. Detailed data would of course remain at the most granular level.
    2) Separate tables for aggregated and detailed, with the DAX measures determining which table we summarise depending on the date context.
    I have come across this scenario before and used solution #1, but I’d be interested to hear your thoughts.

    • Hi Alistair,

      Was great to meet you at Copenhagen 🙂
      Regarding many-to-many scenario using aggregation: I confirm that at the moment aggregations won’t work with many-to-many. There is a workaround for it, which I will write about it soon.

      Regarding the historical aggregated data and the recent detailed data: method #1 is definitely simpler to implement, as you won’t need bringing more logic in your calculation. However, in terms of performance, I have to check. It is a great idea for another blog post by the way (I might write about it sometime). method #2, however, can be good option to use with the aggregated table. It might not need to hit the details table if aggregated data is needed only (however I have to check it in practice).

      Cheers
      Reza

  • Hi,

    I have created aggregated tables and my queries are being mapped in the correct way. When I start to build up my DAX formula I found that the FILTER () argument didn’t seem to be supported as when it was used it always ran a Direct Query. Should this be the case? If so could you explain why it doesn’t hit the aggregated table when it should be preferred?

    Thanks,
    Craig

    • Hi Craig
      There are some limitations to the DAX functions that can work on the aggregated table scenario. For example, Filter in an iterator function. It means it will iterate through each row in the DQ table, and test the filter criteria and then produce the result. Because it has to go through every row, it means the aggregated table won’t be hit. You should use some functions that don’t have this behavior.
      Cheers
      Reza

  • Hi Reza,

    I know there is a limit of 1 million rows using DirectQuery on Power BI Desktop version but what is the row limit on Power BI Pro, Pro+ and Premium versions which a DirectQuery can fetch using Parameters/Filters?

    • The 1M row limit exists regardless of the license.
      and note that this is the limit of your queries, not the limit of data to work with.
      As an example; when you have a column chart with year as the axis and sales as the value, your query only returns one row per year. in total it is probably up to 10 or 20 rows.
      So to hit that 1M row limitation, you need to have a visual that has 1M rows in it. Even if that amount of data was possible in one query return, it would not perform fast for sure, regardless of the BI system you use. Filtering visuals to only the required data is always recommended.
      Cheers
      Reza

Leave a Reply

%d bloggers like this: