Dual Storage Mode; The Most Important Configuration for Aggregations! Step 2 Power BI Aggregations

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

An aggregated table can be a layer on top of the source DirectQuery table. This table needs to have a proper relationship set up with other tables, and also the proper storage mode configuration. Storage Mode configuration of the composite model, in fact, is an important configuration and critical part of setting up the aggregation. The storage mode configuration of Import Data and DirectQuery are self-explanatory, but what about Dual Storage Mode. I’m going to explain in details what the Dual mode is. Let’s dig in. This is the third article about aggregations, in the first article you learned what aggregation is, and in the second article, you learned how to create the aggregated table. This article is talking about the next step which is storage mode configuration in the model. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.

Sample model

In the previous step, we ended up with a data model including tables and relationships below;

Sales Agg table is an aggregated table created in the previous step, and FactInternetSales is a table with DirectQuery source. All of the tables in this model (except Sales Agg) are DirectQuery sourced tables.

What is Storage Mode

Storage mode in Power BI tables is determining where the data of that table is stored, and how queries will be sent to the data source. You can find the storage mode of a table either by hovering the mouse on it in the Power BI Desktop in the fields section;

Or by right-clicking on the table, and selecting properties;

In the properties pane, you can see the storage mode option drop-down list;

There are three storage modes;

  • Import
  • DirectQuery
  • Dual

Import

Import and DirectQuery are the obvious options in the above list. For example; If a table’s storage mode is Import, then it means the data of that table will be stored in the in-memory storage of the Power BI server (the machine that runs the Power BI engine), and every query to the data, will be a query to the in-memory structure, not to the data source.

If I have a table that is sourced from SQL Server but has the storage mode of Import. Then a copy of that data will be stored in the memory engine of Power BI. Whenever you refresh a visualization in Power BI report, it will just query the in-memory structure, rather than sending the query to the SQL Server data source.

DirectQuery

Tables with the DirectQuery storage mode will keep the data in the data source. In our example dataset, FactInternetSales stores the data in the SQL Server data source. If we have a visualization from a table with this storage mode, Power BI will send a T-SQL query to the data source and get the result back. For example, below visualization in the Power BI report is just a Card visual on the SalesAmount field in the FactInternetSales;

Because this table has the storage mode of DirectQuery, if we run SQL Profiler simultaneously (to capture queries sent to the data source), here is the query captured to the SQL Server database:

Query from Multiple DirectQuery tables

If we have multiple tables with DirectQuery storage mode, the result is there will be a query sent to the data source for that combination. Here we have a column chart by SalesAmount from FactInternetSales (DirectQuery), and CalendarYear from DimDate (DirectQuery);

And this is the query sent for that:

Caution: Combining DirectQuery and Import

So far with the above examples, everything was expected. Nothing strange happened. However, it would be strange when you combine fields from DirectQuery sourced tables with Import storage mode tables. As an example; The Sales Agg table is an import table. The whole purpose of creating this table was to query that faster from the in-memory structure. If you create a visual that just contains something from Sales Agg table, there will be no query send to the data source, It all happens in memory. As expected of course.

And SQL Profiler doesn’t catch any queries sent to the data source;

However, the behavior is different If we have a visual getting data from the Sales Agg table, and a DirectQuery table such as DimDate;

This time, however, we see a T-SQL query tracked in SQL Profiler, and it is querying the DimDate in the SQL Server database.

This is not what we actually expect to see. The whole purpose of Sales Agg table is to speed up the process from DirectQuery mode, but we are still querying the DimDate from the database. So, what is the solution? Do we change the storage mode of DimDate to Import? If we do that, then what about the connection between DimDate and FactInternetSales? We want that connection to work as DirectQuery of course.

Now that you learned about the challenge, is a good time to talk about the third storage mode; Dual.

Dual Storage Mode

Dual storage mode is built to cover a scenario like above. With Dual storage mode, one table can act either as DirectQuery or Import respective to the relationship to other tables. Dual storage mode is the secret sauce of Composite mode and aggregation in Power BI. Let’s see how this mode works.

If we change the storage mode of DimDate to Import, then we get the issue of sending the query to the data source when we are querying FactInternetSales. If we change the storage mode of DimDate to DirectQuery, then even for Sales Agg we still sending the query to the data source. The solution is to change the storage mode of common dimension tables to Dual.

When you set the storage mode of a table to Dual, for example, DimDate, you get a warning that this process may take some time. The reason is that with changing the storage mode to Dual, you will get a copy of that table in the memory.

The copy of that table in the memory acts like an Import table. There will be another version of that which works through DirectQuery though. It looks like you have two identical tables, one used for Import, another for DirectQuery.

What would be the impact of that in the reporting? The same visual we had before in the previous example, is now working all through the in-memory engine;

As you can see, no queries sent to the data source for this table;

Because this table is connected to an Import table (Sales Agg), and as a result, the Dual storage mode acts like an Import table storage. Everything will be queried from the memory.

If you have the Dual storage mode table, used in a visualizing alongside a DirectQuery table, similar to below:

This visualization will send a query to the data source, as you can see in the SQL Profiler track below;

In this scenario the Dual storage mode acts like a DirectQuery because it is coming with a connection to a DirectQuery sourced table;

So, now you can tell why this mode is called Dual. Sometimes it acts like Import and sometimes, like DirectQuery, Depending on the table that combined through the visualization.

That’s too Technical, What should I do for setup for the Aggregation?

Well, It was a long post and description was quite technical. You may say; I don’t care about the details, just tell me how to get the aggregations working? Well, the answer is easy: All tables that are connected to both Sales Agg (our aggregation table which is Import), and FactInternetSales (our big fact table which is DirectQuery), should be set to storage mode of Dual. This way, they can act both ways, depend on the situation they are used in.

All tables connected to both Import table (aggregation), and DirectQuery table (big fact table), should be set as Dual storage mode.

In our example, we have to set the Dual storage mode for; DimDate, DimProductSubcategory, DimCustomer. However, when you apply that on DimCustomer, you will notice that a message pops up saying that the DimGeography will be also set as Storage mode. That is correct, Other tables that are related to this through Many to One needs to be Dual too.

The same process will happen with DimProductSubcategory and DimProductCategory. In our sample diagram and model all tables selected below are Dual Storage Mode;

Now your model is ready to perform Step 3; Configuring aggregations. Stay tuned for that article.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

4 Comments

  • Hi Reza Rad
    Thank you for this very informative and well explained series on aggregation. I have been trying to follow your steps using a database I have on SQL server. In your 1st article, you created the ‘Sales Agg’ table by referencing the ‘FactInternetSales’ which is a Direct Query. In this article, you mention that the ‘Sales Agg’ table is the only table in Import mode. When I created in Power query a similar aggregation table referencing a direct query table, the aggregated table is also being shown as ‘Direct Query’ mode instead of ‘Import mode’ as in your example. Am I missing something here?

    • Hi Gopa

      Sales Agg would be first defined as DirectQuery mode automatically because it is sourced from the FactInternetSales. But then after applying the change in Query Editor, when you come to Power BI Desktop, you can change the storage mode of Sales Agg to Import with right click on the table, properties, storage mode.

  • Hi Reza,
    I followed the steps you have mentioned and tried with my model but the queries aren’t hitting the agg. table. My model connects to SQL server on-prem and all tables are table views except the date table which I created using DAX. Further, could you explain the purpose of creating Sum and count of Unit Price?
    Thanks

    • Hi. Have you read my two other blog posts earlier? the very first article explains why we do aggregations. The aggregated table will be a layer on top of the DirectQuery table. Also to do the complete example, and test the functionality, you have to read the next step too. then you can test to see your queries will be hitting the agg table.

Leave a Reply

Your email address will not be published. Required fields are marked *