Performance Tip for Power BI; Enable Load Sucks Memory Up

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-11-14_13h55_31

In the area of performance tuning a Power BI model many things has to be considered, most of them around consumption of the CPU and RAM. One of the most basic but important consideration is minimizing the usage of memory. By default all queries from Query Editor will be loaded into the memory of Power BI Model. In this post I’ll show you an example to disable the load for some queries, especially queries that used as intermediate transformation to produce the final query for the model. This is a very basic tip but very important when your model grows big. If you want to learn more about Power BI read the Power BI online book from Rookie to Rock Star.

Prerequisite

For running examples of this book you need to download the ZIP file here;

Download ZIP file for CSVs

Load Mechanism for Power Query

By Default all queries in Power Query will be loaded into the Power BI model. This behavior might be a desired behavior if you are connecting to a proper star schema modeled data warehouse, because normally you don’t need to make many changes in the structure of queries. However this brings some issues if you are connected to a transactional data store, some files, web source, and many other non-star schema modeled data sources. Normally when you get data from a data source, you apply transformations for rows and columns, and merge queries or append them, and you might end up to have 5 tables out of 10 queries as final queries. By default when you close and apply your query editor window all queries will be loaded into the model no matter if you want to use them in your final model or not.

2016-11-14_08h58_39

For every query that loads into model memory will be consumed. and Memory is our asset in the Model, less memory consumption leads to better performance in most of the cases. I have seen lots of models that people Hide the unwanted query from the model, This approach doesn’t help to the performance because hided query will still consume the memory. The best approach is to disable loading before closing query editor. Disabling Load doesn’t mean the query won’t be refreshed, it only means the query won’t be loaded into the memory. When you click on Refresh model in Power BI, or when a scheduled refresh happens even queries marked as Disable Load will be refreshed, but their data will be used as intermediate source for other queries instead of loading directly into the model. This is a very basic performance tuning tip, but very important when your Power BI model grows bigger and bigger. Let’s look at this through an example.

2016-11-14_08h40_41

Example Scenario

In this example Scenario I want to get list of all files from a directory. There are two types of CSV files in the directory, some Students files, and some course files, both are CSV files, but different structured. Also there might be some other files in the directory, such as Word files or HTML files which I don’t want to process.

2016-11-14_09h11_53

Aim is to load data rows of all students and courses as two separate tables into the model. Instead of fetching files from the folder twice, we use one query to fetch the files, and then use it as a reference for other queries. The referenced query itself doesn’t needs to be loaded into the model. Let’s build the solution and see how it works in action.

Build the Transformations

Get Data from Folder

Open a new Power BI file, and stat by Getting Data from Folder

2016-11-14_13h03_52

Enter the path of folder that contains all files (Files in this folder can be downloaded from ZIP file up in the prerequisite section of this post)

2016-11-14_13h05_41

Click on Edit in the preview showed in the navigator window to open the folder content in Query Editor. As you see there are number of different files in the folder.2016-11-14_13h07_42

Filter the Extension to only .csv. Note that both Course and Student files are .CSV files which is what we need.

2016-11-14_13h08_23

Now the subset includes Course files and Students files which are different in the structure. We have to apply transformations on each set individually. 2016-11-14_13h09_56

Students Query

Because I don’t want to repeat the process of getting all files from a folder, and I want to split the data into two data sets; one for Students and another one for Courses. I’ll generate a REFERENCE from the main query. Right click on the query (which called CSVs in my example), and select Reference.

2016-11-14_13h11_57

This will generate a new query named as CSVs (2). This new query is NOT A COPY of first query. This is only a reference from first query. which means if the first query changes, the source for this query will also change. Rename this new query to be Student. Filter the Name column to everything starting (or begins with) “Student”.

2016-11-14_13h14_15

The reason that I don’t type in in the search box and use Text Filters specifically is that the search box will filter the data statistically based on values that exists in the current data set. If in the future new values (file names) comes in the folder this won’t consider that. However the Text Filter will apply on any data set because it will be filtered dynamically. (I’ll write a post later to explain that in details). In the Filter Rows window type in Student as filter for begins with.

2016-11-14_13h16_46

Now you will see only Students files.

2016-11-14_13h18_09

Click on the Combine Binaries icon on the header of Content column to combine all CSV files into one.

2016-11-14_13h19_17

After combining binaries, Power Query will also import the CSV into the table and do automatic data type conversion. You can see all three steps in the Query Editor.

2016-11-14_13h20_44

The data table needs couple of changes before be ready. First; set the column names. The first row has column names. so use the menu option of “Use First Row As Headers” to promote the first row to be column headers.

2016-11-14_13h22_57

This simply brings column headers;

2016-11-14_13h24_11

Also you need to remove all extra header rows from the combined data set. There are some rows with “ID,First Name, Last Name, and City” as their values which should be removed from the data set. You can simply remove them with a Text Filter of Does Not Equal to on the ID column. values should not be equal to “ID” because every row with “ID” in the first column’s value is a header row and should be removed.

2016-11-14_13h25_52

Now you have you cleaned data set for Students. Note that I have also applied a data type conversion for column ID to type whole number;

2016-11-14_13h27_58

Courses Query

The other entity is Course which we need to create as a reference from the CSVs query. Create another reference from CSVs, and name it Course. 2016-11-14_13h31_22

Create a Text Filter for Begins with on the Name Column with the value of: “Course”

2016-11-14_13h32_05

And the you’ll have only Course files. Combine Binaries on that.

2016-11-14_13h34_47

Same as Student query, apply transformations in this order;

  • Promote headers with “Use First Row As Headers”
  • Create a Text Filter on the Course Number Column that values does not equal to “Course Number”.
  • Apply data type conversion on Course Number and Capacity to Whole Number and Date to Date.

Here is the final data set for course;

2016-11-14_13h37_22

Default Behavior: Enable Load

Now to see the problem, without any changes in the default load behavior, Close the Query Editor and Apply changes.

2016-11-14_13h38_31

You will see that after the refresh, three queries loads in the model; Student, Course, and CSVs.

2016-11-14_13h39_44

Student and Course are expected tables in the model. However CSVs is not useful. we already fetched everything we wanted from the query, and this is used as an intermediate query for loading Student and Course. Having CSVs as a separate table in our model has two problems;

2016-11-14_13h46_10

  • one extra table; confusion for users
  • extra memory consumption

The main issue is the memory consumption for this extra table. and normally memory consumption will reduce the performance and brings heavier load to the model. In this example CSVs table only has few rows. but this is just a sample data, in real world examples you might have intermediate tables with millions of rows. You need to remove every unused table from the Power BI model to enhance the memory consumption.

What about Hiding from Report?

The first issue “Confusion for users” can be solved by hiding the table from report view. You can do that in Relationship tab of the model, right click on the CSVs table, and click on Hide from report view. This method HIDEs the table from report view. However the table still exists, and it consumes memory! It is just hidden. Hiding a table from report view is good for tables that you need for the modeling. For example a relationship table (that creates many to many relationship) should be kept in the model but be hidden. because it creates the relationship, but it is not useful for the report viewer. Any other tables that is not used for relationship and is not used for reporting should be removed before loading into the model.

2016-11-14_13h47_02

Disable Load to Save Memory

Go back to Query Editor, and right click on CSVs query. You will see that by default every query is checked as Enable Load.

2016-11-14_13h50_52

Click on the Enable Load to disable it. You will see a message saying that any visuals attached to this query will not work after this change. Which is totally fine, because this query is not used (and won’t be) for any visualization. The message is: “Disabling load will remove the table from the report, and any visuals that use its columns will be broken.”

2016-11-14_13h52_42

Click on Continue, and you will see the load is disabled for the query now. the query name will be also in Italic font illustrating that it won’t be loaded into the model. Note that query will be refreshed and if new files comes to the directory it will pass it on to Course and Student queries. It just won’t be loaded into the model itself.

2016-11-14_13h55_31

You can also see in the View tab of Query Editor, on the Query Dependencies that this query is the source of the other two queries. and in that view it also shows which query is load disabled.

2016-11-14_13h57_07

Now close and apply the Query Editor. This time you see that CSVs query won’t be loaded into the model. Your relationship tab and the report tab only contains two tables; Course, and Student.

2016-11-14_13h58_29

Summary

In summary you’ve seen how an easy tip can save the memory and enhance the performance of Power BI model. Always remember that Query Editor is your ETL (Extract, Transform, Load) engine. It will apply all transformations before loading the data into the model, but once it finished the transformation all queries will be loaded into the model and they take memory. By default all queries are Enabled to Load into the model. Simply change that for queries that is not required in the model.

Save

Save

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

3 Comments

Leave a Reply to Angus Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">