The default configuration for Power BI dataset is to wipe out the entire data and re-load it again. This can be a long process if you have a big dataset. In this article, I explain how you can set up an incremental refresh in Power BI, and what are the requirements for it. Incremental Refresh is not just in Power BI datasets, but also in Dataflows too. In this article, you learn how to load only part of the data that changes instead of loading the entire data each time. To learn more about Power BI, read Power BI book from Rookie to Rock Star.
What is Incremental Refresh?
When you load data from the source into the destination (Power BI), there are two methods: Full Load, or Incremental Refresh. Full Load means fetching the entire dataset each time and wiping out the previous data. When I say the entire dataset, I mean after all Power Query transformations, because there might be some filtering in Power Query, whatever data that loads into the Power BI dataset in this sentence is considered as the entire data.
If the dataset is small, or the refresh process is not taking a long time, then the full load is not a problem at all. The problem happens when either the dataset is big, or the refresh process takes longer than an acceptable timeframe. Consider you have a large dataset including 20 years of data. From that dataset, probably the data of 20 years ago, won’t change at all anymore, or even the data for 5 years ago, sometimes even a year ago. So why re-processing it again? why re-loading a data that doesn’t update? Incremental Refresh is the process of loading only part of the data that might change, and adding it to the previous dataset which is not changing anymore.
Requirements: What do you need to set up?
Table with Date Field(s)
For setting up the incremental refresh, you do need to have a table (or more) with date field(s) in it. The date field is the field that will have the impact of the partial refresh of the data. For example; Let’s say you have a FactSales table. You want to load all Sales which made earlier than a year ago just once, but everything from a year ago to now regularly. So you do need to have a date field in your table for it. Most of the time, this field can be created date, modified date, order date, publish date, etc. You do need to have a field like that as of date data type.
You also need to have a Power BI Premium capacity. You can set up the incremental refresh in Power BI Desktop even without that, but then when you publish it to the service, if the file is not in a workspace with premium capacity, you won’t be able to use this feature. The premium capacity at the moment is a limitation for many Power BI users. There are some discussions that the incremental refresh might be available somehow for Pro users too. However, sticking to the facts for now; Premium is the only way for the incremental refresh.
Limitations: Things to Know Beforehand
One limitation is the license as mentioned above, However, the other limitation which can be more important is that after setting up the incremental refresh, you cannot download the PBIX file from the service anymore, because the data is now partitioned. It also makes sense, because the size of the data is probably too large for downloading it as well.
How to Setup Incremental Refresh
Setting up Incremental Refresh has some steps in Power BI Desktop, and then in the Power BI Service. Let’s check them one by one.
Enable the Preview Feature
The first step is to enable the Incremental Refresh Policies in the Preview features of Power BI Desktop. If you read this article a year later, this feature might be released at that time, and you might not need this step then. To enable the preview feature, In Power BI Desktop, go to File -> Options and Settings -> Options;
Then go to Preview features tab, and select the Incremental Refresh Policies there. You do need to restart Power BI Desktop (Close and Open again) afterward.
Now you are ready to set up the Incremental Refresh.
Parameters in Power Query
For setting up the incremental refresh in Power BI, you need to use Power Query parameters. You need to create two parameters with the reserved names of RangeStart and RangeEnd (Note that Power Query is a case sensitive language). Go to Edit Queries in your Power BI Desktop solution,
And then click on New Parameters.
Then create two parameters of DateTime data type, with names of RangeStart and RangeEnd, set a default value for each too. The default value can be anything, but the name and the data type should be as mentioned here.
Filter Data Based on Parameters
After creating the two parameters, you need to filter the data of the date field based on these two parameters. For example, in the FactInternetSales, I can filter the OrderDate using the column filtering as below;
You can use the Between filter using the RangeStart and RangeEnd parameters as below;
After doing this action, your data in the table will be filtered based on the default values you have set for the RangeStart and RangeEnd parameters. However, don’t worry about that. these two parameters will be overwritten with the configuration you make in the Incremental Refresh setting of the Power BI Desktop.
Power BI Desktop Incremental Refresh Setup
The final step in Power BI Desktop is to close&apply the Power Query Editor window and set up the incremental refresh setting for the table. You have to right click on the table in the Power BI Desktop, and select Incremental Refresh.
In the Incremental Refresh settings window, you can choose the table first. If the table is a table that doesn’t have the two parameters of RangeStart and RangeEnd used in filter criteria, then you won’t be able to do the setting for it. For example, DimCustomer won’t give me the option to do the incremental refresh settings.
However, I can do the settings for FactInternetSales because I did filter the OrderDate field of this table based on the parameters. There are two things to notice at this step: If your data source coming from a data source that supports query folding, then Incremental load works as it best. If not, it is not recommended to use it. Most of the time, a huge data source is coming from a relational data store system, which supports query folding by the way.
another thing to note is that Incremental Refresh requires Premium licensing. and finally, you have to know that you cannot download a PBIX file from the service if it has incremental refresh setup on it.
Configuration for the incremental refresh is easy. You just set up the amount rows to Store (load only once, and store it), and the amount rows to Refresh (re-load every time);
Incremental Refresh for Multiple Tables
You can set up the incremental refresh for multiple tables. You don’t need more parameters, the two parameters of RangeStart and RangeEnd are enough. You just need to set up the filter in any other tables you want as well.
Then you need to set up the configuration in the Power BI Desktop for each table;
Note that although we are using the same parameters of RangeStart and RangeEnd, you can have totally different configurations of Store and Refresh for the incremental refresh setting for each table.
Publish to Service
After setting up everything, now you can publish the Power BI file to the service. Note that you can only publish it to Premium workspaces.
Detect Data Changes
Incremental Refresh will make the part of the dataset to refresh much smaller, and as a result, the process would be much faster. However, There is still one better way to do that. If you have a modified DateTime (or updated DateTime) in your table, then the process of incremental refresh can monitor that field, and only get rows that their date/time is after the latest date/time in that field in the previous refresh. To enable this process, you can enable the Detect Data Changes, and then choose the modified date or update date from the table. Notice that this is different from the OrderDate or transaction date. And not all tables have such a field.
Only Refresh Complete Period
Depends on the period you selected when you set up the incremental refresh, you can choose to only refresh when the period is complete. For example, in the FactInternetSales, I set the refresh period to Year. And then I can set the option to Only Refresh the Complete Period, which means even If we are in Feb 2019, it will only refresh the data up to Dec 2018 (because that is the latest date that we have a full complete year in it);
Incremental Refresh for Dataflows
Dataflows are ETL process in the cloud for the Power BI service. You can also set up the Incremental Refresh for Dataflows. And it is even easier to do it for the dataflow. You don’t need to create the RangeStart and RangeEnd parameters there. Just go to the dataflow Incremental Refresh setting directly.
You can see that the same set up of the incremental refresh setting is available for the dataflow;
Setting up the incremental refresh in Power BI means loading only part of the data on a regular basis, and storing the consistent data. This process will make your refresh time much faster, however, there are some requirements for it. You need to have a date field in your table, and at the moment, there is a licensing requirement for Power BI premium for it. You can do this configuration on a Power BI dataset, or in Power BI dataflows. If you do this on a dataset, then after publishing the dataset, you cannot download the PBIX file.
Have you ever considered using Incremental Refresh? If you haven’t done it so far, let me know why in the comments below, and I’m always happy to help you with that.