Custom Visuals in Power BI; Build Whatever You Want

Published Date : March 9, 2016

2016-03-09_23h47_05

In previous posts of Power BI from Rookie to Rock Star you’ve learned that Power BI has bunch of useful built-in visualizations such as bar chart, column chart, waterfall chart, tree map, and many other visualizations. However there is always requirement for new types of visualizations. You might want to visualize parts of the data in a custom image. or any other visualization that is not built-in. Fortunately Power BI gives  you ability to create your own custom visualizations. You can the share your visualization with others in your organization or even with the world through public custom visuals website. In this post I’ll show you some of great examples of custom visuals in Power BI and how to use them. In future posts we will go through scenario of creating one.

2016-03-09_22h55_13

 

Synoptic Panel by SQL BI

This is one of the coolest custom visuals I’ve seen so far. The reason is that this custom visual has a customization in it! with this visualization you can define regions in any picture or images, and map data points to the image in your Power BI report. The image can be everything; human body, airplane seat layout, shop floor layout or football field. You choose the image you want, then you define regions. Let’s have a closer look at this visual.

You can download the latest version of Synoptic Panel from Power BI visuals gallery or from the Synoptic Design website. Please note that custom visuals might be developed by third parties not Microsoft, so you are using them at your own risk (I mean in terms of support if an issue happens). Synoptic Panel is a custom visual built by SQLBI company. After downloading this visual you will have a *.pbiviz file.

2016-03-09_22h56_45

You can import that file into Power BI desktop with clicking on ellipsis button in visualization pane and choosing the *.pbiviz file.

2016-03-09_22h59_22

After importing you will see the visual listed in visualization pane.

Now let’s do an example with that. Consider we have a dataset of car accidents which has date of accident, registration number of the car, car part which broke in the accident, accident type and severity of the accident. Here is the example data set:

2016-03-09_23h04_07

As you see in the dataset above the Damage column shows which part of car the damage has been occurred. So a very useful visualization is having a layout of car exterior design (considering that damage only happened on the exterior for simplicity). With a simple search in google I’ve found image below:

1648382-SW-Car-Layout-for-presentation-vector-Stock-Photo-car-outline-rear

Source of the image: http://www.123rf.com/photo_1648382_sw-car–layout-for-presentation–vector.html

What I need to do is to define regions on this image. Synoptic Panel also has a designer tool that allows me to draw regions! so let’s do that. Go to Synoptic Designer website and upload your image there.

2016-03-09_23h10_17

add the image there, and then use the designer tool for drawing an area. after drawing the area you will see area highlighted and listed on the right hand side.

2016-03-09_23h12_33

Define other areas (for bumper, roof…). and name each area as the values you have in your dataset’s Damage column.

2016-03-09_23h16_42

Now you can export this design into your Power BI with clicking on Export To Power BI. this will creates a *.svg file for you to use in the Synoptic Panel visual.

2016-03-09_23h18_36

Import the *.svg file into Power BI desktop’s Synoptic Panel visual. As you can see I’ve mapped Damage field to Legend of the custom visual and count of Regos to values field.

2016-03-09_23h21_48

After adding the *.svg file as the map you will see regions highlighted. This shows that there are some values for those regions.

2016-03-09_23h25_42

You can set a color state for this visual. Before doing this you should set a field as the state field for this visual;

2016-03-09_23h29_31

Then you define rules for coloring: For example of value is between 0 to 5 then state color one with Green color. if it is between 6 to 10 then state color 2…. Here is what I defined:

2016-03-09_23h27_49

Nice visualization, isn’t it? There are some other customization that you can apply with this visual. I’ll leave that part to you for further studying if you are interested. Watch SQLBI YouTube video which illustrates some of this visual’s features. There are also some per-built maps for this visual that you can use if you want.

Chiclet Slicer

The other custom visual I want to talk about in this post is Chiclet Slicer. This is a custom visual built by Microsoft itself. The default slicer in Power BI is very basic, you can’t do much formatting on it. Chiclet slicer allows you to do text formatting, box formatting, and also adding images to your slicer items.

2016-03-09_23h35_53

Let’s add this slicer to our existing car accident example. As you’ve seen in dataset above we have accident type field which contains values such as roof damage, motor Cycle accident, over heat and etc. It is good to have a slicer that shows these accident types as images. So what I’ve done was creating another spreadsheet with accident types and a field called logo which is a URL for an image related to each item.

2016-03-09_23h38_49

I’ve then loaded this dataset into my Power BI model and created relationship based on AccidentType. Then I’ve changed the Data Category of Logo field in my Data tab to Web URL.

2016-03-09_23h40_19

Now in the Chiclet Slicer custom visual I set fields as accident type, and Image as Logo, and count of rego as Values. Here is the result.

2016-03-09_23h42_23

For the result above I’ve also configured number of columns in my chiclet slicer to be 2.

2016-03-09_23h44_16

Even if I don’t have an image for chiclet slicer still I can use it as it shows text field in a way much better. I’ve used it for Severity field as well. and the final result is here:

2016-03-09_23h47_05

You can see that adding two custom visuals made my report much better in terms of both look and feel and also understanding the insight better. In future posts of Power BI from Rookie to Rock Star book I’ll explain some other useful custom visuals and the method of developing your own custom visual.

 


Power BI and Spark on Azure HDInsight; Step by Step Guide

Published Date : February 1, 2016

20

Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.

This post is a section of Power BI from Rookie to Rock Star online book.

Prerequisite

 

A Bit of Introduction

What is Spark? What is HDInsight?

spark-logo-trademark

I’ve spoken about Power BI well enough in previous chapters. However you might be new to big data names such as Spark and HDInsight, and would like to know what these are? I say it in the most simple method; HDInsight is the framework for running big data, you can have your big data storage as a distributed storage for massive availability. Spark on the other hand is another big data framework that works great for analytics and big data operation, however it doesn’t work with distributed files. So the real world big data solution has most of the time Spark on top of an HDFS (Hadoop Distributed File System) or Azure HDInsight to gain both scalability and powerful analytics. Spark is an Apache application, and often called as Apache Spark. However Spark can be installed on both Windows and Linux servers. In this sample for simplicity we will use Spark on Linux server.

What is Jupyter?

jupyter

Jupyter is a live notepad engine for writing and running codes such as Python (it supports mote than 40 languages!) on servers (such as Spark server).

 

Setup and Configure Spark on Azure HDInsight

Let’s start building the first Spark on Azure HDInsight. You can skip this part if you have an Spark on Azure HDInsight already.

  1. Login to Azure Portal through https://portal.azure.com.
  2. Go to New, and then under Data + Analytics choose HDInsight
  3. 2016-02-01_14h03_39
  4. In the New HDInsight Cluster definition section set a name for cluster (my sample is mysparkcluster), then set Cluster Type as Spark (it is still in preview stage), then Cluster Operating System as Linux. When you choose Linux it will install on Ubuntu 14.04 LTS.
  5. 2
  6. Choose one of your subscriptions for Azure (If you have more than one), then assign a resource group. You can choose one of existing resource groups or create a new one.
  7. 3
  8. Set the Credentials for Spark cluster. set a password for admin user. Then write a SSH username and password. SSH username and password used to connect to spark cluster through command line. In this example we won’t use command line. There is also a limitation on some reserved SSH user names that you cannot use.
  9. 4
  10. Set Data Source. You can choose an existing storage account, or create a new one. and set the container and location.
  11. 5
  12. Choose Node Pricing Tiers. In this example because we are not processing large amount of data I’ll do it with couple of nodes. You can see how much this cluster costs you per hour based on your selection of nodes and pricing tier for worker nodes and head node.
  13. 6
  14. Now Create the HDInsight Cluster.
  15. It will take about 15 minutes for the Cluster to be created and visible in your Microsoft Azure dashboard.
  16. 7
  17. Now you have your Spark on Azure HDInsight ready for the next step.

8

 

Setup the Sample Table for Spark on Azure HDInsight

In this section we will load a sample data into a table (inside our Spark on Azure HDInsight cluster) using some codes that we run from Jupyter.

  1. Open Jupyter for your Spark on Azure HDInsight with going to https://<your spark cluster name>.azurehdinsight.net/jupyter
  2. You will be asked for username and password. Enter the admin username and password we’ve created in previous section.
  3. 10
  4. Click on New and choose a Python 2 type notebook.
  5. 9
  6. You can name the document something meaningful and then copy and paste code below there into a cell
  7. from pyspark import SparkContext
    from pyspark.sql import *
    from pyspark.sql import HiveContext
    from pyspark.sql import Row
    
    # Create Spark and Hive contexts
    sc = SparkContext('yarn-client')
    hiveCtx = HiveContext(sc)

     

  8. Then press Shift+Enter (to run the code and go to the next cell)
  9. You will see the Jupyter icon showing that the Kernel is busy after running commands.
  10. Kernel Busy
    Kernel Busy
  11. Wait for the command to run, and Kernel back to idle state.
  12. 12
    Kernel Idle
  13. Now add script below in the second cell;
  14. # Create an RDD from sample data
    hvacText = sc.textFile("wasb:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv")
    
    # Create a schema for our data
    Entry = Row('Date', 'Time', 'TargetTemp', 'ActualTemp', 'BuildingID')
    
    # Parse the data and create a schema
    hvacParts = hvacText.map(lambda s: s.split(',')).filter(lambda s: s[0] != 'Date')
    hvac = hvacParts.map(lambda p: Entry(str(p[0]), str(p[1]), int(p[2]), int(p[3]), int(p[6])))
    
    # Infer the schema and create a table       
    hvacTable = hiveCtx.createDataFrame(hvac)
    hvacTable.registerTempTable('hvactemptable')
    dfw = DataFrameWriter(hvacTable)
    dfw.saveAsTable('hvac')

    Commands above will populate a hive table called hvac from a CSV sample file located on the server.

  15. After pressing Shift+Enter, and waiting for the kernel to go back to idle mode. you can query all hive tables with below command;
  16. hiveCtx.sql("SHOW TABLES").show()

     

  17. after Shift+Enter you will see the result below;
  18. +---------------+-----------+
    |      tableName|isTemporary|
    +---------------+-----------+
    |  hvactemptable|       true|
    |hivesampletable|      false|
    |           hvac|      false|
    +---------------+-----------+
  19. You can even query hvac table to see the data;
  20. hiveCtx.sql("SELECT * FROM hvac LIMIT 10").show()

     

  21. and the result will be;
  22. +-------+--------+----------+----------+----------+
    |   Date|    Time|TargetTemp|ActualTemp|BuildingID|
    +-------+--------+----------+----------+----------+
    |6/10/13| 2:43:51|        68|        63|         3|
    |6/11/13| 3:43:51|        70|        55|        17|
    |6/12/13| 4:43:51|        69|        59|        17|
    |6/13/13| 5:43:51|        69|        68|        19|
    |6/14/13| 6:43:51|        67|        79|         2|
    |6/15/13| 7:43:51|        67|        71|         3|
    |6/16/13| 8:43:51|        69|        63|        16|
    |6/17/13| 9:43:51|        66|        56|         9|
    |6/18/13|10:43:51|        65|        61|        16|
    |6/19/13|11:43:51|        67|        74|         7|
    +-------+--------+----------+----------+----------+

Now we have table with the data ready for Power BI to work with, Let’s go to the next step.

13

Connection from Power BI Desktop to Spark on Azure HDInsight

As I said in the very first part this tutorial, you need to install Microsoft Spark ODBC driver to be able to connect from Power BI Desktop to Spark on Azure HDInsight.

  1. Open Power BI Desktop and go to Get Data Section.
  2. Choose Azure, and then Azure HDInsight Spark.
  3. 14
  4. Enter the spark cluster URL as https://<your spark cluster name/azurehdinsight.net
  5. 16
  6. If you don’t have Microsoft Spark ODBC Driver installed you will see an error as below;
  7. 17
  8. If you have driver installed you will be asked for the username and password (the admin user that we’ve created for Spark on Azure HDInsight before)
  9. After successful connection you will see list of tables. Choose hvac simply and then Load.
  10. 18
  11. That was really simple from Power BI point of view, wasn’t it?!
  12. Now you can visualize it as you want! for example I’ve created an Area Chart with Time as axis, and average of ActualTemp and average of TargetTemp as values. I’ve changed color of data values as below.
  13. 22
  14. I’ve also added a slicer (horizontal) for the BuildingID.
  15. 23
  16. And here is the result.

19

You can simply publish your report to Power BI website. Let’s now see how you can connect to Spark from Power BI website itself.

 

Connection from Power BI website to Spark on Azure HDInsight

  1.  Login to Power BI website and Get Data from Databases, and Choose Spark on Azure HDInsight as the source.
  2. 20
  3. Type in the server name as https://<your spark cluster name>.azurehdinsight.net and then connect
  4. Sign in to the server with user name and password
  5. 21
  6. Now you can build the report you want from the hvac table.

 

 

References;

 https://powerbi.microsoft.com/en-us/documentation/powerbi-spark-on-hdinsight-with-direct-connect/

https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/

https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/


Be Fitbit BI Developer in Few Steps: Step 3 Visualization

Published Date : December 3, 2015

13

In previous steps you’ve learned how to extract and mash up the data from CSV and automate the process of looping through all CSV files in the source folder. In this step we are going to build visualizations based on the data set we’ve built so far. In this post you will see how easy is to build charts in Power BI desktop and how customize-able they are. You will also learn about using few DAX calculations to produce new measures that helps building visualization elements.

Adding Date Dimension

Before going to visualization part let’s add a date dimension to our queries. I have previously explained an example of date dimension with Power Query, so All you need to do is to use that as the source.  The Date dimension query produces table below as result:

1

It is also configurable as it mentioned in the blog post, and it includes public holidays (New Zealand holidays actually, you have to change it a bit to cover your country holidays which is easy).

I have above query in an Excel file and I use that as a source in Power BI Desktop. So here is what I have in my Query Editor; A query for Date (which comes from date dimension query example explained above), and the Tracking Data query (which we have built it in previous step of this exercise);

2

As you see in the above table the FullDateAlternateKey is not showing the right date format. I just right click on it and change it to Date format so I have proper date field there;

3

A Touch to Modelling

Now Close the query editor and apply. This will load result of both queries into the memory and model for Power BI Power Pivot component. Go to Data tab, and expand the Date under fields section. You will see that many columns have a sigma icon beside them, the reason is that Power BI desktop based on their data type (whole number or decimal) decided that these are values that can be summarized. You can change this behavior by choosing any of those columns and change default stigmatization to Do Not Summarize.

4

Now go to Relationship tab and create relationship between two tables based on their full date columns which are; FullDateAlternateKey Column from Date, and Dated Column in Tracking Data table.

5

Adding Few Static Measures

Fitbit calculates based on my current weight and age (I assume) how much calories I have to spend each day. I don’t know that calculation, So I create a static measure with the value of 2989 for the amount of calories I have to spend each day. I also create StepsCap measure with 12000 value showing that I have to walk 12000 steps a day, and another one for FloorCap with the value of 10. I created a Calories HighEnd measure with 5000 calories as value (I will die if I burn more than that!). You can create all these measures easily in Data tab.

6

Let’s Visualize

It’s time to visualize what we have so far in the data set. Go to Report tab, and start with building a Gauge. Then drag and drop CaloriesCap as Target Value. You can also put Calories HighEnd measure as Maximum. and then drag and drop Calories measure into the value. change the aggregation from sum to Average as it showed in below image.

7

Easy! and lucky me who spent (in average) more than the calories I have to spend each day. Do the same for Floor with FloorCap and Floors measure. Now Add Three Slicers for; Year, Month Name, and Day. When you create slicer for Month Name you will see month names are ordered by their alphabetic order of names which is not right. Go to data tab, and click on Month Name Column of Date table. Then from the menu options of Modeling choose Sort By Column and then choose Month (which is the number of month in year);

8

Now that you’ve set ordering you will be able to see Month names in the correct order in the slicer as below;

9

Change the orientation of Day slicer to be horizontal rather than vertical in the formatting option of the slicer

10

Now let’s add two column charts; one for Steps as Value, and Dated column (from Tracking Data table) as axis. The other one with same axis, and Floors as value

11

Add a stacked column chart with Dated as axis, and three measures as value: Minutes Very Active, Minutes Fairly Active, and Minutes Lightly Active.

12

Add Also an average of all of three measure above in a pie chart to show the total average. and Add two Card visualization one for Average of steps, and another for average of distance.

Well, you can customize the visualization as you want in the formatting option of each chart or element. Have a play with it and create something remarkable. I’m sure you design it better than me, here is what I built in few minutes;

13

We are using the date dimension here, and one of the benefits of having date dimension is that I can analyze the data based on weekdays, and public holidays. and see which day of week usually I perform best in walking! Here we go;

14

Interesting! Isn’t it? I can get more insights out of this visualization than what I see in Fitbit dashboard. I can see that I’m terribly performing bad in taking floors in public holidays! Well our home is flat single floor. However I’m doing a little bit better in weekends. My best days for Floors is always weekdays and there isn’t really so much difference between them.

15

Now if I check my steps; I’m not doing that bad in public holidays, 7K in public holidays in comparison with 11K is other days is fine for me ;) And I’m doing even very close in weekends and weekdays. If I look at each day’s average I can see that I’m a bit tired in Sundays and prefer to rest more, however Saturdays are good days for me! so my overall weekend average goes up because of that. and for some unknown reasons Tuesdays I walk less! That’s something I have to consider really why it happens.

16

In the spit of what I see in Floors and Steps, I’m still doing a bit closer to my steps results in my very active minutes average. and my best active minutes comes Fridays. There is an obvious reason for it. Fridays I play indoor soccer with some friends, and I really feel my heart is coming out of my chest at some minutes in the game! Here is the evidence; I got most active minutes in Fridays!

17

 

We can go on and on with building more visualizations on this data set. There is no limitation on the amount of insight you can get from it, so I leave it to you to use your own creativity and build dashboards and charts that gives you more insights. You’ve seen so far how easy is to be a BI developer if you use Power BI ;)

 


Be Fitbit BI Developer in Few Steps: Step 2 Loop Through All CSV Files

Published Date : December 2, 2015

2

In the first step you’ve learned how to manipulate and mash up the data with few changes in Power Query M script. In spite of Power BI Desktop not being able to recognize your CSV file correctly because of the header row, you managed to write script and fetch data as you required. However this process needs to be repeated for the second file, and then the third file, and god knows what happens if you have 1000s of files. Real case scenarios contains many files, even millions. So a manual process won’t be the answer. You have to create your data transformation process in a way that works automatically not matter how many files are there.

Warning: This post contains scripting!

In this post we will write M Power Query script to loop through all CSV files in the source folder and process them one by one with the fetch table script that we produced in previous step. So be prepared to write some code, and don’t scare from it ;) It is almost impossible to create a BI solution for real world problem solving that doesn’t include scripting (even a bit), So you have to be familiar with that. Yes, Power BI is not just a nice drag and drop dashboard building tool, and has many powerful features. You have to spend time to learn it. So here you go, let’s write some codes here.

Looping Through CSV Files

Unfortunately Power Query or let’s say Power BI doesn’t have a loop structure, and that is because of the functional structure of this language. However there are data structures such as Table and List that can be easily used with each singleton function to work exactly as a loop structure does. Here in this post I will get you through the process of looping into files in a directory and processing them all, and finally combining them into a large big table. You will also learn some Power Query M functions through this process.

Where We’ve been in Previous Step

In previous step we ended up with below script

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\fitbit_export_201507.csv"),null,null,1252)}),
    #"Removed Top Rows" = Table.Skip(Source,1),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
    #"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
in
    #"Reordered Columns"

That generates table below from an exported CSV file of Fitbit activities

19

Now There are multiple files in the source folder that I want to repeat the process for all of them;

3

Get Data From Folder

Let’s start the process with reading all files in a folder. Power BI Desktop can do that easily. Through Get Data options choose from Folder and the select the folder containing source files as above.

1

This will load all files and their general information into a single table. There are columns for name of file, folder path, date created, modified, and etc. There is also a column named Content. This is an important column as it contains the whole content of each file in a Binary format.

2

Custom Function

So far we have list of all files and the script to convert their content into a desired table structure. We can create a custom function with that script and re-use it for each file’s content. The process of creating custom functions are easy. I’m not going to explains every parts of creating custom function because that’s a whole separate topic. If you are interested to read more about Creating Custom Functions in Power Query read this post.

Go into Advanced Editor, and create the function as below highlighted section;

let
    FetchTable = (table) => 
	let
   	 	Source = Table.FromColumns({Lines.FromBinary(table,null,null,1252)}),
   	 	#"Removed Top Rows" = Table.Skip(Source,1),
   	 	#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
   	 	#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
  	 	#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
  	 	#"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    		#"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    		#"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    		#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    		#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    		#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
	in
    		#"Reordered Columns",


    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles")
in
    Files

Note that all the highlighted part from line 2 to line 16 in code above is the code for the custom function. The first line of function (line 2 in above script) contains name of the function; FetchTable, and the input parameter; table. From line 3 to line 16 is all the script we’ve copied from first step of this exercise.

To Make sure that the function works properly you can call it as below;

FetchTable(Files{[#"Folder Path"="C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\",Name="fitbit_export_201509.csv"]}[Content])

Add above single line of code in the IN section of script instead of Files. and you will see the result table as below:

3

So this shows the function definition and function call work perfectly. Now let’s loop through files.

EACH Singleton Function

EACH is a singleton function, which means it has one input parameter and one output. It can be applied on EACH record in a table or EACH item in a list. with using that you can apply calculations, data transformations, adding custom columns and many other operations. For our case I want to use EACH to apply FetchTable function on every Content value of the Files table. Each is easy to use as below;

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"})
in
    DataColumnSelected

As you see the single line above and specially the section each FetchTable([Content]) will apply FetchTable function for every Content value in each record of the table. and as a result we will have a new column called Data that includes tables. line 3 of script above just select Data column (because we don’t need other columns). Here is the result;

4

As you see in the result set above we have one table with one column (named Data), and it includes a Table in every item. This is the table that we want to use, so we have to combine all of these tables into one table. Fortunately there is a function for it in Power Query called Table.Combine .

Combining, Appending, or UNION of Tables

Table.Combine does the UNION of multiple tables into one table. This function get a LIST of tables (they should be all with the same structure), and returns a single table which is appended version (or UNION version) of all the table. Here is the definition of this function;

Table.Combine(tables as list) as table

As you mentioned probably this function needs a LIST as an input parameter. And so far we have built a TABLE (single column table) which needs to be converted to list. There are different methods of converting a table into a list. I’ll show you one method here. I first fetch only a single column list from this table with Table.ToColumns function, and then I get the only first value of the list with List.First function. Finally I use the Table.Combine with the list parameter, here’s the code:

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"}),
    TablesColumn=Table.ToColumns(DataColumnSelected),
    TablesList=List.First(TablesColumn),
    AllTables=Table.Combine(TablesList)
in
    AllTables

Line 4 converts the table to a list, but a list that has another list within. Because a table might have more than one column.

Line 5 gets the first item of the list, which would be the first column of the table. For our example as we have only a single column then only that column would be fetched.

Line 6 finally combines all the tables in the list into one large table.

And the result:

5

You can see that I have now all tables combined as I have all months together in this final table.

Here is the full script of this demo so far:

let
    FetchTable = (table) => 
	let
   	 	Source = Table.FromColumns({Lines.FromBinary(table,null,null,1252)}),
   	 	#"Removed Top Rows" = Table.Skip(Source,1),
   	 	#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
   	 	#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
  	 	#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
  	 	#"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    		#"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    		#"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    		#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    		#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    		#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
	in
    		#"Reordered Columns",

    Files = Folder.Files("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles"),
    TableAdded= Table.AddColumn(Files,"Data",each FetchTable([Content])),
    DataColumnSelected=Table.SelectColumns(TableAdded,{"Data"}),
    TablesColumn=Table.ToColumns(DataColumnSelected),
    TablesList=List.First(TablesColumn),
    AllTables=Table.Combine(TablesList)
in
    AllTables

That ends this step. You’ve learned some scripting skills here today, and I hope you’ve enjoyed it. With few lines of scripting like this you can add massive power to your queries, you can automate things and get rid of manual extra work. Power Query is one of the most power full components of Power BI that gives you ability to apply many transformations and data mash up. The next step will be about Visualization of the data we’ve prepared so far, so stay tuned!

 


Be Fitbit BI Developer in Few Steps: First Step Get Data from CSV

Published Date : December 1, 2015

0

Power BI made things really easy, it isn’t hard to build dashboards and data analysis on almost any types of data sources. Previously in other sections of the online book; Power BI from Rookie to Rockstar I’ve mentioned few data sources and ways of connecting to them. In this post I like to show you how Power BI easily works with CSV or Text files. CSV or Text files can be in different formats, but for this example I through using Fitbit exported CSV data sounds to be great demo. The reason is that you probably have seen your Fitbit dashboard in the website as below:

1

So for this post we are going to build that dashboard (not all of that obviously, because we don’t have the data required for all of that), but most part of it with Power BI. You will see how easy and powerful is Power BI in this kind of scenarios, and you will see how you can be the BI Developer of Fitbit in a few steps of building this demo.

Before start I have to apologize in advance to Fitbit developer team, and I have to say that I don’t want to say they are not doing anything. Obviously they are doing great job, and the most part of this great job is hidden from our view, they are mashing up the data to build it in the way that can be visualized best. So I admire them because of their efforts. In this post I just want to show how easy is to build dashboards with Power BI when the data is available in CSV or Text format.

So Let’s get started. Fitbit devices are usually smart bands that measure your steps, number of floors, calories spent, heart rate and some other information. You can always download an extracted version of your activities as CSV file from your account page in Fitbit website. This always gives you a month period of data. So if you want the data to be exported for two different month, you will end up with two CSV files.

2

I have exported my Fitbit activities history from the time that I got it as a gift (Great gift I have to say :) ) till now, which is 5 months.

3

Here is sample data set in files:

4

I start reading data from CSV in Power BI Desktop with Get Data from CSV. But this is the result I get:

5

Well, this is happening because of the very first record in CSV file which is the header row, as you see below;

6

The first row doesn’t have any delimiter, and that is why Power BI Desktop THINKS that this is a one column CSV file. To fix this issue you can use two approaches:

The first approach isn’t my favorite as it is manual, and won’t be so much of use in real world cases when you have 1000s of files. so let’s go with the second. Click on Edit and go to Query Editor window. There you can click on Advanced Editor to Change the code.

7

In the Advanced Editor window you see the M script that generates the output.

8

Change the Code to below;

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\fitbit_export_201507.csv"),null,null,1252)})
in
    Source

Now you can see the result as full data in one column rather than only first column with date values.

9

Follow steps below to shape the data for using the best in report;

Get rid of first record with Remove Rows, and then choosing 1 top row to remove;

10

You can also remove any blank rows at the end of report

11

Then select Column1 and Got to Transform Tab, Choose Split Column, then By Delimiter

12

Leave the options as default (which is delimiter Comma, and Split at each occurrence of the delimiter) in Split Column by Delimiter dialog box and continue. You will see how columns all split as below;

13

Set the first row as header, which going to Home Tab in Query Editor window and click on Use First Row As Headers option.

14

This option will rename all column headers to the values from the first row;

15

Now change data type of Distance column to Decimal, and all other columns (except Date) to Whole Number

16

For the Date Column itself you can’t always change the type that easily, because the Date conversion works with locale of your system. As you see in screenshot below the Date format is DD-MM-YYYY which not always equal to locale of all systems.

17

If you try to convert it you might get an error. So because of that I convert it through M script. Click on Add Column, and then choose Custom Column. Here is where you can write M script to generate new column. the script is:

18

Date.FromText([Date],"en-NZ")

The locale that you might be using might be different. Here are list of culture and locale information. After doing that you will see Dated Column added to the end of table with Date data type. Now you can remove the original Date column and move the new Dated Column to the beginning of the table as below;

19

So our data is ready now to work with. And this is the final M script of our work;

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\Reza\SkyDrive\Blog\PowerBIBook\PQ1\CSV\SourceFiles\fitbit_export_201507.csv"),null,null,1252)}),
    #"Removed Top Rows" = Table.Skip(Source,1),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}, {"Column1.8", type text}, {"Column1.9", type text}, {"Column1.10", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
    #"Removed Blank Rows" = Table.SelectRows(#"Promoted Headers", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Steps", Int64.Type}, {"Calories Burned", Int64.Type}, {"Floors", Int64.Type}, {"Minutes Sedentary", Int64.Type}, {"Minutes Lightly Active", Int64.Type}, {"Minutes Fairly Active", Int64.Type}, {"Minutes Very Active", Int64.Type}, {"Activity Calories", Int64.Type}, {"Distance", type number}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Dated", each Date.FromText([Date],"en-NZ")),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"Dated", type date}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Dated", "Calories Burned", "Steps", "Distance", "Floors", "Minutes Sedentary", "Minutes Lightly Active", "Minutes Fairly Active", "Minutes Very Active", "Activity Calories"})
in
    #"Reordered Columns"

However we have some other files to work with as well, and they all have same structure. We can use M Script to loop through all files in the source directory and load them one by one automatically, but that would make this post a very long post, and requires also a bit of scripting which you might not like. I will explain that part in the next section. For now just for you to stay interested here is a glance of a sample report at the end of this effort (we will build that together through next steps);

23