An aggregated table can be a layer on top of the source DirectQuery table. This table needs to have a proper relationship set up with other tables, and also the proper storage mode configuration. Storage Mode configuration of the composite model, in fact, is an important configuration and critical part of setting up the aggregation. The storage mode configuration of Import Data and DirectQuery are self-explanatory, but what about Dual Storage Mode. I’m going to explain in details what the Dual mode is. Let’s dig in. This is the third article about aggregations, in the first article you learned what aggregation is, and in the second article, you learned how to create the aggregated table. This article is talking about the next step which is storage mode configuration in the model. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.
In the previous step, we ended up with a data model including tables and relationships below;
Sales Agg table is an aggregated table created in the previous step, and FactInternetSales is a table with DirectQuery source. All of the tables in this model (except Sales Agg) are DirectQuery sourced tables.
What is Storage Mode
Storage mode in Power BI tables is determining where the data of that table is stored, and how queries will be sent to the data source. You can find the storage mode of a table either by hovering the mouse on it in the Power BI Desktop in the fields section;
Or by right-clicking on the table, and selecting properties;
In the properties pane, you can see the storage mode option drop-down list;
There are three storage modes;
Import and DirectQuery are the obvious options in the above list. For example; If a table’s storage mode is Import, then it means the data of that table will be stored in the in-memory storage of the Power BI server (the machine that runs the Power BI engine), and every query to the data, will be a query to the in-memory structure, not to the data source.
If I have a table that is sourced from SQL Server but has the storage mode of Import. Then a copy of that data will be stored in the memory engine of Power BI. Whenever you refresh a visualization in Power BI report, it will just query the in-memory structure, rather than sending the query to the SQL Server data source.
Tables with the DirectQuery storage mode will keep the data in the data source. In our example dataset, FactInternetSales stores the data in the SQL Server data source. If we have a visualization from a table with this storage mode, Power BI will send a T-SQL query to the data source and get the result back. For example, below visualization in the Power BI report is just a Card visual on the SalesAmount field in the FactInternetSales;
Because this table has the storage mode of DirectQuery, if we run SQL Profiler simultaneously (to capture queries sent to the data source), here is the query captured to the SQL Server database:
Query from Multiple DirectQuery tables
If we have multiple tables with DirectQuery storage mode, the result is there will be a query sent to the data source for that combination. Here we have a column chart by SalesAmount from FactInternetSales (DirectQuery), and CalendarYear from DimDate (DirectQuery);
And this is the query sent for that:
Caution: Combining DirectQuery and Import
So far with the above examples, everything was expected. Nothing strange happened. However, it would be strange when you combine fields from DirectQuery sourced tables with Import storage mode tables. As an example; The Sales Agg table is an import table. The whole purpose of creating this table was to query that faster from the in-memory structure. If you create a visual that just contains something from Sales Agg table, there will be no query send to the data source, It all happens in memory. As expected of course.
And SQL Profiler doesn’t catch any queries sent to the data source;
However, the behavior is different If we have a visual getting data from the Sales Agg table, and a DirectQuery table such as DimDate;
This time, however, we see a T-SQL query tracked in SQL Profiler, and it is querying the DimDate in the SQL Server database.
This is not what we actually expect to see. The whole purpose of Sales Agg table is to speed up the process from DirectQuery mode, but we are still querying the DimDate from the database. So, what is the solution? Do we change the storage mode of DimDate to Import? If we do that, then what about the connection between DimDate and FactInternetSales? We want that connection to work as DirectQuery of course.
Now that you learned about the challenge, is a good time to talk about the third storage mode; Dual.
Dual Storage Mode
Dual storage mode is built to cover a scenario like above. With Dual storage mode, one table can act either as DirectQuery or Import respective to the relationship to other tables. Dual storage mode is the secret sauce of Composite mode and aggregation in Power BI. Let’s see how this mode works.
If we change the storage mode of DimDate to Import, then we get the issue of sending the query to the data source when we are querying FactInternetSales. If we change the storage mode of DimDate to DirectQuery, then even for Sales Agg we still sending the query to the data source. The solution is to change the storage mode of common dimension tables to Dual.
When you set the storage mode of a table to Dual, for example, DimDate, you get a warning that this process may take some time. The reason is that with changing the storage mode to Dual, you will get a copy of that table in the memory.
The copy of that table in the memory acts like an Import table. There will be another version of that which works through DirectQuery though. It looks like you have two identical tables, one used for Import, another for DirectQuery.
What would be the impact of that in the reporting? The same visual we had before in the previous example, is now working all through the in-memory engine;
As you can see, no queries sent to the data source for this table;
Because this table is connected to an Import table (Sales Agg), and as a result, the Dual storage mode acts like an Import table storage. Everything will be queried from the memory.
If you have the Dual storage mode table, used in a visualizing alongside a DirectQuery table, similar to below:
This visualization will send a query to the data source, as you can see in the SQL Profiler track below;
In this scenario the Dual storage mode acts like a DirectQuery because it is coming with a connection to a DirectQuery sourced table;
So, now you can tell why this mode is called Dual. Sometimes it acts like Import and sometimes, like DirectQuery, Depending on the table that combined through the visualization.
That’s too Technical, What should I do for setup for the Aggregation?
Well, It was a long post and description was quite technical. You may say; I don’t care about the details, just tell me how to get the aggregations working? Well, the answer is easy: All tables that are connected to both Sales Agg (our aggregation table which is Import), and FactInternetSales (our big fact table which is DirectQuery), should be set to storage mode of Dual. This way, they can act both ways, depend on the situation they are used in.
All tables connected to both Import table (aggregation), and DirectQuery table (big fact table), should be set as Dual storage mode.
In our example, we have to set the Dual storage mode for; DimDate, DimProductSubcategory, DimCustomer. However, when you apply that on DimCustomer, you will notice that a message pops up saying that the DimGeography will be also set as Storage mode. That is correct, Other tables that are related to this through Many to One needs to be Dual too.
The same process will happen with DimProductSubcategory and DimProductCategory. In our sample diagram and model all tables selected below are Dual Storage Mode;
Now your model is ready to perform Step 3; Configuring aggregations. Stay tuned for that article.