Loop Through On-Premises Files with Power BI and Schedule it to Refresh Automatically

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-10-07_13h28_10

Previously I’ve explained a method of looping through number of files in a folder which used Custom Functions. This method is preferred method buy not always working with Schedule Refresh and gateway setup, because scheduled refresh doesn’t always support custom functions in the code. So I this post I show you a second method that loops through the files dynamically, and it doesn’t use custom function, and you will see at the end of this post that schedule refresh works with this method. For better clarification of the scenario I will first do custom function approach, show you the problem with scheduled refresh, and then I’ll explain about the second method. If you would like to learn more about Power BI, read Power BI online book; from Rookie to Rock Star.

Prerequisite

For running example of this post you need to download set of CSV files from here: fitbitsourcefilesradacad

Defining the Challenge

In this example we have number of CSV files which has list of my Fitbit activities, one file for each month. All under the same folder, all same structure. We want to apply number of similar steps to each files and loop through them and fetch the data all into single data set. After building a report on top of that we want to publish it to Power BI service, and get that refreshed through a gateway configuration. As files has similar structure we are looking for a mechanism to apply same steps on all files.

You will see two approaches with this, first one is preferred approach because it uses custom functions, however because scheduled refresh doesn’t support custom functions always, so there is a second approach which will work with automatic refresh. If you are visiting this page at the time that problem of scheduled refresh and custom function resolved, then use the first approach, otherwise use the second.

First Approach: Custom Function

This is Preferred Approach but is not working with Schedule Refresh at the time of writing this post.

Process one File

Open a New Power BI Desktop, Start by Get Data, and Select CSV as a source.

2016-10-07_11h16_55

Choose of the files in Fitbit Source folder and continue;

2016-10-07_11h17_48

Go to Edit Queries Window with clicking on Edit button, and You will see data previewed there.

2016-10-07_11h21_53

As you can see in the preview the first record isn’t useful. it has just a title “Acitivities”, so let’s remove it with going to Remove Rows, Remove Top Rows. and enter 1 in the dialog box;

2016-10-07_11h23_17

Now the first row only contains column headers. So let’s promote it to column headers with clicking on Use First Row As Headers.

2016-10-07_11h24_55

There are also some blank rows, so let’s remove them with Remove Rows -> Remove Blank Rows.

2016-10-07_11h26_36

Now we’ve done all clean up work for this file. We have one row per each day with details of activities in that day. Next step is to loop through all files in the source folder, and apply same number of steps on them. Fortunately we can use Custom functions for this. Previously I’ve explained how Custom Functions are helpful in code re-usability, better maintenance, and consistency. Instead of repeating same steps for each file is it much wiser to create a function from these steps, and call the function for each file. So let’s do it;

Creating Parameter

For creating the function through GUI, the first step is to create a parameter. You can create Parameter from Menu option or with right click on the blank area in Queries section.

2016-10-07_11h33_15

Create the Parameter with name of FullPath, of type Text, and set a default value to be full path (including file name, and extension) of one of CSV files in your folder;

2016-10-07_11h35_07

After creating the function you should be able to see that with Parameter icon in Queries pane;

2016-10-07_11h36_06

Creating Function

Next step is to get the current query (in the screenshot named as fitbit_export_201507) sourced from the FilePath parameter instead of static path. Click on the query, and choose settings from steps in right hand side pane;

2016-10-07_11h38_17

Change the File path in the setting window to be sourced from Parameter, and because you have only one parameter FilePath parameter will be selected automatically.

2016-10-07_11h40_01

After clicking on OK, you will see result of first step before applying your changes, so simply click on the last step to see the final result. if the file path you have provided as the default value of parameter be wrong you will see an error, otherwise you will see the data of that particular file. Now let’s create the function;

Right click on the query that we just changed the source of it (fitbit_export_201705), and select Create Function.

2016-10-07_11h43_02

Set the function name as ProcessFile, and leave the parameter as is.

2016-10-07_11h43_50

Because the query used parameter as a source so it can be simply used to create a function. Now You will see a function listed in Queries pane as well.

2016-10-07_11h44_59

Note that the function uses M (Power Query Formula language) with Lambda syntax. But we have created it all just with graphical user interface option. If you are interested to learn more about code side of it, read this post. This function now has all the steps generated in the first query. To see the steps you can simply click on Advanced Editor while you have ProcessFile function selected in Queries pane.

2016-10-07_11h48_42

Don’t be afraid of this code! you don’t need to change anything here. It was just for letting you know that we have same steps copied into the function.

Now you can remove the first query, we don’t need that anymore, steps are already in the custom function.

2016-10-07_11h50_16

Get Data From Folder

Now that we have the function ready, let’s get list of files and apply it on all files. Get Data from a New Source, and More.

2016-10-07_11h52_07

Choose Folder this time;

2016-10-07_11h53_07

And select the folder which contains all CSV files;

2016-10-07_11h53_52

After clicking OK here and also on the preview, you will see list of all files in this folder with their details such as file name, extension, date created…. and folder path. You also see a column as Content, which is the Binary format of this files (the actual data in each file), I will talk about this column more in second approach.

2016-10-07_11h57_14

For this approach we just need a full file path which we can generate from Folder Path and Name , so only select these two columns and remove everything else.

2016-10-07_11h57_54

Now let’s concatenate these two and create a full file path, Select Folder Path first, and then hold ctrl key and select Name column then. Go to Transform Tab and Choose Merge Columns

2016-10-07_11h59_21

Name the new concatenated column as FullFilePath, You don’t need to put a separator for it, because Folder path already has an ending backslash in the string.

2016-10-07_12h00_44

This will replace existing two columns with the new FullFilePath concatenated column;

2016-10-07_12h02_02

Call Custom Function For Each File

Now we have everything ready for the custom function to be called for every record of this table. Go to Add Column tab, and click on Invoke Custom Function;

2016-10-07_12h03_42

In the Invoke Custom function dialog box, select the function as ProcessFile, and the input parameter this time comes from the FullFilePath column that we created few minutes ago. Name this column as Content.

2016-10-07_12h04_59

After clicking on OK, in an instant, you will see the new column appears which has a Table in each cell. This is the data table processed for each file. If you click on a blank area of one of table’s cells you will see the data preview of that in a details view pane underneath preview pane.

2016-10-07_12h08_22

We don’t need FullFilePath column anymore, so Remove that column. Now in the table we have just one column, which is a structured column. Means it has a structured value in each cell (in this case a table in each cell). We can now expand this structured column to all underlying columns with clicking on Expand button. Also remember to uncheck the box for Use Original Column Name as prefix.

2016-10-07_12h42_21

This expand button (or you might call it Magic button now, as it reveals the magic!) combines all data tables into a single big data table, including data rows from all CSV files.

2016-10-07_12h44_28

All we need to do now is some data type changing;

Data Type Conversion

Change data type of all columns except Date and Distance to Whole Number. Change data type of Distance column to Decimal. and Finally change data type Date field to Date. For the data type of Date field to be changed you need to choose locale. because the exported files has date values in New Zealand Date format, and if you change it with normal data type conversion, Power Query try to convert their data type based on your local system date format, which might not be New Zealand’s format. So right click on Date Column. under Data Type choose Using Locale.

2016-10-07_12h48_34

In the new dialog box, choose the data type to convert to as Date, and set locale as English New Zealand

2016-10-07_12h49_30

After clicking on OK you will see data types converted correctly without any error. As the final step now change the query name to Tracking Data;

2016-10-07_12h51_08

Notice that Date column showed in above screenshot is converted to a Date format (you can see the date format icon in the column header), but it shows in this window to me in my computer locale format which is English (United States). The way that it shows doesn’t matter. All matters is that this is of Date format.

Build Sample Report

Now let’s build a very simple Report with the data. Close and Apply Power Query window, and in the Report pane, add Steps as Total value in a Card Visualization. Also in the formatting section set the Format of data label to display as None (means without any millions, or K, or anything like that).

2016-10-07_12h54_56

The other visualization to add is a Column Chart, which Steps as Value, Date as Axis. It it will automatically brings Date Hierarchy, So remove Quarter and Day from the hierarchy.

2016-10-07_13h01_35

After removing Quarter and Day, click on Expand all down one level in the hierarchy in the top left corner of the column chart (screenshot above numbered 2);

Also create a measure to show the last date. Go to Modeling tab, New Measure, and use LastDate function to get the latest date of the date field in the Tracking Data table;

 

2016-10-07_13h06_49

Show this new measure as a Card visual in the report.

2016-10-07_13h08_07

We can make this report much richer with adding more insights and visualizations, but let’s keep that simple for now. Save the report and Publish. You can save it with a name like Fitbit Custom Function Method.

Set up Gateway

Because we want the report to be refreshed from on-premises folder we need to set up a gateway. Both Personal and On-Premises gateways can do this. Explanation of how to install and configure gateway is out of scope of this post, If I want to explain that I will make this post VERY LONG! If you want to learn more about gateways read this post about Personal gateway, or this one about On-Premises Gateway. I explain everything from after successful setup of gateway here, where we need to add a new data source.

New Data Source Configuration

We have to create a data source under the gateway for our local folder containing CSV files. Go to Manage Gateways in Power BI Service

2016-10-07_13h16_19

Now select your gateway, and add a new data source under it.

2016-10-07_13h17_48

Set up the new data source for Folder and set the local folder path there;

2016-10-07_13h19_47

Note that the local path should be accessible from the machine that has gateway installed on it. otherwise it should be a shared path. The windows username used here also should be for an account that has read access on the folder we are trying to access.

After successful setup you will see a message clarifying that;

2016-10-07_13h21_40

Scheduled Refresh

We’ve published the report earlier. Now it’s the time to Schedule it for refresh, and configure it to get its data from gateway.  Under Datasets find the Fitbit Custom Function dataset, and click on ellipsis button and choose Schedule Refresh.

2016-10-07_13h26_32

In the Schedule Refresh Setting section, You will see an error message that says you can not configure gateway for this data set. That’s because of custom function used here. (Note that this happens at the time of writing this blog post, It might be resolved by the time that you are reading this post)

The error message is: You can’t schedule refresh for this dataset because one or more sources currently don’t support refresh.

2016-10-07_13h28_10

Now you see the problem, It was a long way to show you the issue! but you need to know it before using many custom functions in your code. Let’s go through the second approach to resolve it.

Second Approach: Combine Files

This approach has some limitations and difficulties involved, However at the time of writing this post, this is the only way of getting the scheduled refresh working.

Get Data From Folder

With this method we start with getting data from Folder. Open a new Power BI Desktop solution, and Get Data From Folder, exactly as you’ve seen in the Get Data From Folder section earlier in this post.

2016-10-07_11h53_52

This time however we don’t use custom functions, we want to use binary data of the file content. Content column includes the Binary Data. Content column is all we need, so remove all other columns.

2016-10-07_13h34_52

If you click on a blank area in any Binary cell in the content column, you will see the CSV file in preview pane under it.

2016-10-07_13h35_59

Combine Binaries

This Content column is a structured column, and any structured column can be expanded. Expanding a structured column with Binary values Will combine them together. Click on Combine Binaries button on the top right hand side of the column.

2016-10-07_13h38_26

Combine Binaries will combine content of all files together, and will generate a big list of all data rows. You can also see automatic steps applied after combining binaries in the steps pane.

2016-10-07_13h39_27

Now that we have all data rows. Let’s do some cleanup;

As you can see in the preview the first record isn’t useful. it has just a title “Acitivities”, so let’s remove it with going to Remove Rows, Remove Top Rows. and enter 1 in the dialog box;

2016-10-07_11h23_17

Now the first row only contains column headers. So let’s promote it to column headers with clicking on Use First Row As Headers. There are also some blank rows, so let’s remove them with Remove Rows -> Remove Blank Rows.

There are still some additional rows with “Activities” or “Date” labels in their first column. These are heading rows in each CSV file. After combining content of all files these appears in the middle of the data set.

2016-10-07_13h43_42

We can remove them by applying a filter on Date column.  Click on Filter icon for Date column and from Text Filters choose Does Not Equal.

2016-10-07_13h45_51

Choose Does Not Equal, and with an AND condition type in both Activities and Date there.

2016-10-07_13h47_14

This will filter all extra labeled rows, and you will only see actual data rows.

2016-10-07_13h48_52

Now Apply Same data type conversion steps and build the same report as we’ve done earlier. So continue again from Step Data Type Conversion.

Apply the same steps from Step Data Type Conversion, and Build the Sample Report till saving the report.

Save this Power BI file as Fitbit Combine Binaries. and Publish.

Schedule Refresh

If you have already set up the Data Source for this folder, you don’t need to do it again. Just go directly to Schedule Refresh under Fitbit Combine Binaries dataset.

2016-10-07_13h55_10

This time in Schedule Refresh setting you won’t see any error, and you can simply choose the Gateway and apply. Because we haven’t used any custom functions here, so no error message anymore. All good to go.

2016-10-07_13h57_05

Under this section there is another section that you can Schedule report to be refreshed, daily or weekly depends on how you want it.

2016-10-07_13h59_38

Testing Automatic Refresh

Now let’s test the refresh. Download this new CSV file and paste it in the same folder of other CSV files (the folder that you’ve set up for gateway as a data source); fitbit_export_201605

Refresh the dataset with Refresh Now (Or wait untill schedule refresh happens! I’m not that patient, so I’ll do refresh now :) )

2016-10-07_14h03_25

If there be no errors in connection and gateway setup and access to the source folder, then refresh finish successfully. Go to report and refresh the report to see the new data.

You will notice that Data of month May is added. Total number of steps is now over 4 million, and last processed Date is 31st of May 2016.

2016-10-07_14h04_50

Congratulations, You have done the example successfully, and much harder; You have finished this very long post :)

Which Method is Best?

If the Gateway issue with custom functions resolves, I’d say first method with custom function is definitely the best method. Because you have vast flexibility with custom function. Combine Binaries doesn’t always produce an easy output, with some excel files it might generate some data rows which needs many more steps for cleansing the final data. But for now that gateway has this issue with custom functions, the only possible way is Combine Binaries.

Save

Save

Save

Save

Save

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

2 Comments

  • Great work, this will save me huge amounts of time when it comes to delivering reports!

    I do have one question: The resulting data set after merging all the files does, as far as I can see, not have any reference to the original file. I need a column in the dataset that would contain the filename of the original csv-file. I would believe that I could extract the filename in some of the first steps in this guide, and add it to a new column. Then when the function is called, It would do that for every single file.

    I can however not seem to figure out how to do this. I tried the Csv.Document-function, but it seem to deliver only the content rather than the filename I’m after. Is there any way to do this?

    • Hi Jan,
      You can use the FileName field for that. before combining files with each other; keep File Name, and the file content columns, then expand/combine.
      Cheers
      Reza

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">