Be Fitbit BI Developer in Few Steps: Step 2 Loop Through All CSV Files

2

In the first step you’ve learned how to manipulate and mash up the data with few changes in Power Query M script. In spite of Power BI Desktop not being able to recognize your CSV file correctly because of the header row, you managed to write script and fetch data as you required. However this process needs to be repeated for the second file, and then the third file, and god knows what happens if you have 1000s of files. Real case scenarios contains many files, even millions. So a manual process won’t be the answer. You have to create your data transformation process in a way that works automatically not matter how many files are there.

Warning: This post contains scripting!

In this post we will write M Power Query script to loop through all CSV files in the source folder and process them one by one with the fetch table script that we produced in previous step. So be prepared to write some code, and don’t scare from it 😉 It is almost impossible to create a BI solution for real world problem solving that doesn’t include scripting (even a bit), So you have to be familiar with that. Yes, Power BI is not just a nice drag and drop dashboard building tool, and has many powerful features. You have to spend time to learn it. So here you go, let’s write some codes here.

Looping Through CSV Files

Unfortunately Power Query or let’s say Power BI doesn’t have a loop structure, and that is because of the functional structure of this language. However there are data structures such as Table and List that can be easily used with each singleton function to work exactly as a loop structure does. Here in this post I will get you through the process of looping into files in a directory and processing them all, and finally combining them into a large big table. You will also learn some Power Query M functions through this process.

Where We’ve been in Previous Step

In previous step we ended up with below script

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\fitbit_export_201507.csv"),null,null,1252)}),
    #"Removed Top Rows" = Table.Skip(Source,1),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
    #"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
in
    #"Reordered Columns"

That generates table below from an exported CSV file of Fitbit activities

19

Now There are multiple files in the source folder that I want to repeat the process for all of them;

3

Get Data From Folder

Let’s start the process with reading all files in a folder. Power BI Desktop can do that easily. Through Get Data options choose from Folder and the select the folder containing source files as above.

1

This will load all files and their general information into a single table. There are columns for name of file, folder path, date created, modified, and etc. There is also a column named Content. This is an important column as it contains the whole content of each file in a Binary format.

2

Custom Function

So far we have list of all files and the script to convert their content into a desired table structure. We can create a custom function with that script and re-use it for each file’s content. The process of creating custom functions are easy. I’m not going to explains every parts of creating custom function because that’s a whole separate topic. If you are interested to read more about Creating Custom Functions in Power Query read this post.

Go into Advanced Editor, and create the function as below highlighted section;

let
    FetchTable = (table) => 
	let
   	 	Source = Table.FromColumns({Lines.FromBinary(table,null,null,1252)}),
   	 	#"Removed Top Rows" = Table.Skip(Source,1),
   	 	#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
   	 	#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
  	 	#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
  	 	#"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    		#"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    		#"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    		#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    		#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    		#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
	in
    		#"Reordered Columns",


    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles")
in
    Files

Note that all the highlighted part from line 2 to line 16 in code above is the code for the custom function. The first line of function (line 2 in above script) contains name of the function; FetchTable, and the input parameter; table. From line 3 to line 16 is all the script we’ve copied from first step of this exercise.

To Make sure that the function works properly you can call it as below;

FetchTable(Files{[#"Folder Path"="C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\",Name="fitbit_export_201509.csv"]}[Content])

Add above single line of code in the IN section of script instead of Files. and you will see the result table as below:

3

So this shows the function definition and function call work perfectly. Now let’s loop through files.

EACH Singleton Function

EACH is a singleton function, which means it has one input parameter and one output. It can be applied on EACH record in a table or EACH item in a list. with using that you can apply calculations, data transformations, adding custom columns and many other operations. For our case I want to use EACH to apply FetchTable function on every Content value of the Files table. Each is easy to use as below;

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"})
in
    DataColumnSelected

As you see the single line above and specially the section each FetchTable([Content]) will apply FetchTable function for every Content value in each record of the table. and as a result we will have a new column called Data that includes tables. line 3 of script above just select Data column (because we don’t need other columns). Here is the result;

4

As you see in the result set above we have one table with one column (named Data), and it includes a Table in every item. This is the table that we want to use, so we have to combine all of these tables into one table. Fortunately there is a function for it in Power Query called Table.Combine .

Combining, Appending, or UNION of Tables

Table.Combine does the UNION of multiple tables into one table. This function get a LIST of tables (they should be all with the same structure), and returns a single table which is appended version (or UNION version) of all the table. Here is the definition of this function;

Table.Combine(tables as list) as table

As you mentioned probably this function needs a LIST as an input parameter. And so far we have built a TABLE (single column table) which needs to be converted to list. There are different methods of converting a table into a list. I’ll show you one method here. I first fetch only a single column list from this table with Table.ToColumns function, and then I get the only first value of the list with List.First function. Finally I use the Table.Combine with the list parameter, here’s the code:

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"}),
    TablesColumn=Table.ToColumns(DataColumnSelected),
    TablesList=List.First(TablesColumn),
    AllTables=Table.Combine(TablesList)
in
    AllTables

Line 4 converts the table to a list, but a list that has another list within. Because a table might have more than one column.

Line 5 gets the first item of the list, which would be the first column of the table. For our example as we have only a single column then only that column would be fetched.

Line 6 finally combines all the tables in the list into one large table.

And the result:

5

You can see that I have now all tables combined as I have all months together in this final table.

Here is the full script of this demo so far:

let
    FetchTable = (table) => 
	let
   	 	Source = Table.FromColumns({Lines.FromBinary(table,null,null,1252)}),
   	 	#"Removed Top Rows" = Table.Skip(Source,1),
   	 	#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
   	 	#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
  	 	#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
  	 	#"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    		#"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    		#"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    		#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    		#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    		#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
	in
    		#"Reordered Columns",

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"}),
    TablesColumn=Table.ToColumns(DataColumnSelected),
    TablesList=List.First(TablesColumn),
    AllTables=Table.Combine(TablesList)
in
    AllTables

That ends this step. You’ve learned some scripting skills here today, and I hope you’ve enjoyed it. With few lines of scripting like this you can add massive power to your queries, you can automate things and get rid of manual extra work. Power Query is one of the most power full components of Power BI that gives you ability to apply many transformations and data mash up. The next step will be about Visualization of the data we’ve prepared so far, so stay tuned!

 

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

2 thoughts on “Be Fitbit BI Developer in Few Steps: Step 2 Loop Through All CSV Files

Leave a Reply