Not Folding; the Black Hole of Power Query Performance

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-11-15_11h04_37

Have you ever thought does your Power Query Transformations happens in the data source side (server side), or on the local memory (client side)? When you use a relational data source, query can be run on the data source, but it depends on transformations. Some transformations can be translated to query language of the data source, some not. as an example; recently (I believe from last few releases) Power BI Desktop added a feature called Merge Columns. Merge Columns concatenate columns to each other to either create a new column, or replace them with the new concatenated result. Previously you could do the concatenation with adding concatenation simply with & character, what you’ve done was adding a new custom column, and writing M expression to concatenate columns. Now with the new Merge Column this are much easier, you select columns and apply Merge Columns. This easiness does come with a price, a high price I’d say, price of reducing the performance of Power Query and as a result Power BI! Merge Columns doesn’t support query folding and it means it will affect performance badly. In this post I’ll show you how this cause the performance issue, and how it can be solved. Note that Merge Columns is an example here ,this situation might happens with some other transformations as well. If you like to learn more about Power BI; read the Power BI online book from Rookie to Rock Star.

Query Folding

I can’t start talking about the issue without explaining what Query Folding is, so let’s start with that. Query Folding means translating Power Query (M) transformations into native query language of the data source (for example T-SQL). In other words; when you run Power Query script on top of a SQL Server database, query folding will translate the M script into T-SQL statements, and fetch the final results.

Here is an example of M Script:

And here is the folded version of that translated to native T-SQL query:

You can see as an example how the conditional column script in M translated to Case statement in T-SQL.

Is Query Folding Good or Bad?

Good obviously. Why? because performance is much higher to run transformations on billions of records in the data source, rather than bringing millions of records into cache and applying some transformations on it.

2016-11-15_11h04_37

The first diagram shows a M script fully folded (translated to T-SQL). This is the best situation. Server side operation apply all transformations on the data set and returns only desired result set.

The second diagram shows Query Folding partially supported (only up to specific step). In this case T-SQL brings the data before that step. and data will be loaded in the local cache, and rest of transformations happens on M engine side. You have to avoid this option as much as possible.

Can I see the Native Query?

The question that might comes into your mind right now is that; Can I see the Native Query that M script translates to it? The answer is Yes. If Query Folding is supported on a step, you can right click on that step and click on View Native Query.

2016-11-15_10h44_47

So Why Not Query Folding?

Query Folding is enabled by default. However in some cases it is not supported. For example if you are doing some transformations on a SQL Server table in Power Query and then join it with a web query query folding stops from the time you bring the external data source. That means transformations will happen on the data of SQL Server table. then before joining to web query it will be fetched into cache and then rest of steps happens by M engine. You would need to bring data from different data sources in Power BI, and this is the ability that Power Query gives to you. So sometimes you have to step beyond query folding, and there might be no better way of doing that.

There are also some Transformations in Power Query that Query Folding doesn’t support them. Example? Merge Columns! Fortunately there are workarounds for this situation. Let’s dig into this more in details.

Example: Merge Columns

Merge Columns concatenates columns to each other in the order of selection of columns. You can also specify the delimiter character(s). So simply if you want to create a full name from first name and last name, you can select them in right order, and from either Transform tab, or Add Column tab choose Merge Columns. Let’s see this through an example;

Prerequisite for running the example

You need to have AdvanetureWorksDW database installed on SQL Server. or alternatively you can use any table in SQL Server that has two string columns which can be concatenated.

Merge Column Transformation

Create a new Power BI file in Power BI Desktop, and Get Data from SQL Server with Import Data Mode. Get Data from DimCustomer table only, and click on Edit. When in Query Editor. Select First Name, Middle Name, and Last Name in the correct order, and then from Add Column Select Merge Columns.

2016-11-15_11h14_36

In the Merge Columns Specify the Separator to be space, and name the new column to be Full Name.

2016-11-15_11h16_18

You will see the new column generates simply and adds to the end of all columns. You also may notice that Merge Columns uses Text.Combine Power Query function to concatenate columns to each other.

2016-11-15_11h17_04

Now to see the problem with Query Folding, right click on Inserted Merge Column step in Applied Steps section. You will see that View Native Query is disabled.

2016-11-15_11h18_44

Rule of thumb is that: When View Native Query is not enabled, that step won’t be folded! that means the data will be loaded into cache up the step before this step. and then rest of the operation will happen locally. To understand how it works, right click on the step before which was Navigation. You will see the View Native Query. Click on that, and you can see the T-SQL query for that which is as below;

You can see that this is simple query from DimCustomer Table. What will happen here in this scenario is that Power Query cannot translate Text.Combine to T-SQL Query. So data up to step before will be loaded into the cache. It means the query for step before (which is above query) will run on the database server, the result will come to the cache, and then Text.Combine will happen on the local memory in cache.  Here is a diagram of how it works;

2016-11-15_11h23_14

In this example the data set is so small, but if data set be big, then not folding cause performance issues. It is taking much longer to load the whole data set into cache and then apply transformations, rather than doing transformations in the data source, and just loading the result into Power BI.

Solution: Simple Concatenate with Add Column

Now remove the step for Inserted Merged Column, and go to Add Column Tab, and select Custom Column

2016-11-15_11h26_23

In the Add Custom Column write below expression to generate Full Name;

2016-11-15_11h28_30

This expression used the concatenation character which is &. and also checked if Middle Name is null or not. Result in Power Query side is the same, and it generates the Full Name column like the previous example;

2016-11-15_11h30_16

However it is different for Query Folding. Right Click on the Added Custom step, and you will see the Native Query this time.

2016-11-15_11h31_32

Query is simply the same with a new concatenated column added.

This time there won’t be an intermediate cache. transformation happens in the data source with the T-SQL query, and result will be loaded into Power BI.

2016-11-15_11h36_14

How Do I know Which Transformations Folds?

Great Question. It is important to understand which step/transformation folds and which doesn’t. To understand that simply right click on every step and see if the View Native Query is enabled or not. If it is enabled Query Folding is supported for that step, otherwise not. Also Note that Query Folding is not supported for data sources such as web query, or CSV or things like that. Query Folding at the moment is only supported for data stores that supports a native query language. For Web, Folder, CSV… there is no native query language, so you don’t need to be worry about Query Folding.

*Important Note: At the time of writing this post Merge Columns doesn’t support Query Folding. I have reported this to Power Query Team, and they are working on it, to solve the issue. The Merge Columns is very likely to support query folding very soon as the result of bug fix. However there are always some other transformations that doesn’t support query folding. This post is written to give you an understanding what kind of issue might happen, and how to resolve it.

My advise to you as a performance best practice is that when working with a relational data source (such as SQL Server). always check the query folding. Sometimes it is not supported, so use another approach for the transformation. Don’t fall into the black hole of not folding, otherwise your query might take ages to run.

Save

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

3 Comments

  • Hi Reza,

    Great article on Query Folding. In a scenario that I use SQL Server as a data source with the Import option and my Power Query transformation doesn’t support Query Folding, the performance is impacted only during data import and data refresh, right?

    • Hi Cristhian,
      Thanks for your kind words.
      It will impact your development time and also the data refresh. Development time because query results first loads into your local cache and then other steps applies on top of it. and at the Refresh time as well as you mentioned. It won’t be like measure that affect performance with each slicer or filter selection, however because it consumes some more memory in overall you will see some performance decrease with large data sets.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">