Previously you have read how to use Power Query as the data transformation component of Power BI and you learned how to use it in BI applications. You can read more about Power Query usage in BI in Power BI online book from Rookie to Rock Star. In this series I will explain how to use Power Query for non-BI usage. This data transformation tool can be used anywhere to use input parameters, fetch the data from web or any other data sources, mash up the data to form the required shape, and automate the process. In this series I will show you a real use case of using Power Query for non-BI application.
Why?
As an introduction to this series, I want to take you to the path that leads me to use Power Query here. You might be aware that I am teaching Power BI courses, and most of my courses are online and Live. This means that courses are not recorded videos, it is me on the other side of the line with full interactive audio and video experience with students with Go2Meeting application. Students connecting to me from other places in the world. So I do need an event date/time scheduler that I can announce date and time of the event in different time zones.
Fortunately there is a very good website that helps to find a date/time in different time zones. In this website I can set my input parameters as the date/time of my event locally (in my city), and name of the event, and duration.
Then the website produces a result set of important cities in different time zones with the date and time of the event in their time zones. That’s brilliant, isn’t it?
The main issue here is that; I want to use this list in my course announcement web page, and list above is overwhelming! If I copy and paste list above, it will make my page so long. Audience will lose interest to read such a long page to find their city and local time zone. What I need is this:
- List of all above cities with the event date time on their time zone, categorized by time. That means; Los Angeles, San Francisco, Las Vegas, Seattle, and Portland, should be all listed in one row of my result set table.
- I want this list to be in Excel format, so I can send it to my mailing list or attach it as an excel table
- I don’t want to produce the data from website. I would like to change parameters in my Excel file, and it automatically fetches data from this website, and load result set back to my Excel spreadsheet.
So as a result I decided to use Power Query, because with Power Query I can;
- Have input parameters in an Excel sheet for local date/time, and duration of the event.
- Browse web URL and fetch result set.
- Mash up the output data to produce my desired data set.
- Load result set into Excel.
So Let’s now start building this solution step by step.
What You Will Learn?
By completing this scenario, you will learn a real-world use case of Power Query for non-BI solution. You will learn using Power Query for;
- Defining a table for input parameters in Excel sheet
- Using Power Query to read values from that table as input parameter and build web URL for timeanddate.com web query.
- Querying web URL and Fetching the data
- Data Manipulation with Power Query such as; Removing Columns, Replacing Values, Appending Tables, Grouping Rows, Adding Custom Columns, Changing Tables to Column, Columns to Lists, Lists to Tables, and Concatenating Table Columns with Comma.
- Automating the Power Query solution to take the parameters and produce result set when you hit Refresh button.
Prerequisite
This example has been done with Excel 2016 which has Power Query embedded. So if you have Excel 2016 you don’t need to install anything. If you are using Excel 2013 or 2010 you need to download and install Power Query for Excel Add-in. Power Query editor experience in all of these versions of Excel are the same. The only difference is that Power Query menu options in Excel 2016 is located under Data Tab;
And in Excel 2013 and 2010 are located under Power Query Tab;
Get Data from Web
I’ll start by the easy part of the work, which is using a static web URL like this:
In Power Query I can fetch tables produced in this web page. Follow the menu as: Go to Data Tab (In excel 2013/2010 Power Query Tab), New Query, From Other Data Sources, and then From Web
Enter the URL mentioned above in the dialog box
Power Query is smart enough to separate each table, and It shows you a preview of each table. Choose the table that has values with cities and date/times.
I can see in preview above that there are some extra columns, and also I need to apply some transformations. So I Click on Edit to lunch query editor for further data manipulation.
Apply Basic Transformations
When query editor lunches and Power Query loads the preview of table, it also automatically applies some data type changes as the very first step. Here is the data set when it loads in query editor for the first time;
There is a Header column which is not giving me any useful information, So I’ll remove it by right click on the column and Remove column
There are also some * characters in cities names (at the end of some cities, such as Barcelona*). So I remove them by clicking on the Location Column, then choosing Replace Values from icons under Home menu, and then in replace values dialog box I replace * with blank.
I’ll do the same step again Location2 Column this time. Here is what my result set looks like so far;
As you can see in screenshot above all applied steps so far are listed in the right pane (numbered 3 in screenshot above). This is one of the great features of Power Query that allows you rollback changes anytime you want and check preview of query for specific step.
In screenshot above I have to set of columns; first one numbered 1, and the second one numbered 2. These are not repetitive values. You can check city names in columns Location and Location2. The fact is that timeanddate.com website split the result set into two set of columns and produced HTML table like that. So for reading the full data set, I have to separate these two set of columns and then combine their values into a single set of columns.
So I’ll produce a result set for the second set of columns with removing the first set;
This will give me the second set of columns as below;
Then I rename the column names for this data set. The reason for this rename is that later on when I want to combine two queries together column names should be the same. I’ll rename Columns to be Location, Start Time, End Time. for rename I just right click on each column and Rename
And here is my result set for the second data set:
Insert a Step
Now I want to produce another variable for the first set of columns. Instead of creating the whole process again from Get data from web, I start from one step ago which was before removing columns. As I’ve mentioned earlier I can grab that status easily from the Applied Steps section in Query Settings pane. There is a step named Replaced Value1. This step belongs to where I replaced * characters with blank for Location2 columns.
I Click on Replaced Value1 step and then choose second set of columns to remove.
When I click on Remove columns as above, I see a message says; Are you sure you want to insert a step? I get this message because this is not the latest step in my query editor. As long as I know what I am doing adding an extra step is fine.
I hit Insert for the message above and a new step will be created called Removed Columns2. With the result set as below;
Fixing Insert Step reference issue
When I insert a step in Power Query, the step after that will source from inserted step. This will cause some issues in my scenario. I inserted an step and removed Location2, Start time2, and End time2. and then in Removed Columns1 I want to remove other columns and in Renamed Columns rename columns that does not exists anymore. So I have to fix the reference issue happened by inserting the step. Here is how I fix it:
I go to View tab, Advanced Editor (or I can find Advanced Editor button in Home tab). This will open Advanced Editor with M code for me. Then I replace the highlighted part to be referenced from Replaced Value1.
Now my data sets exists as below:
Removed Columns2: the data set for first set of columns
Renamed Columns: the data set for second set of columns
Append Queries
Now It’s the time to combine these two data sets. There is a menu option that combines two data sets. However, it only appends two QUERIES. As I have everything here in one query (but different steps or let’s say variables), so I can’t use that option. Fortunately, I can do that with a single line of M scripting.
I click on Advanced Editor to see the M code generated behind the scene. Then I Insert code line below at the end of LET clause, after adding a single comma (which is line separator in Power Query) in the line before it.
#"Appended Query" = Table.Combine({#"Removed Columns2", #"Renamed Columns" })
Statement above creates a new data set as a variable called Appended Query. I’ve used Table.Combine to append queries (or UNION them in SQL and Database terminologies). This function is a simple function and gets data sets to be combined which are Removed Columns1 and Renamed Columns1.
Then I show this step in the result set with writing that in IN section:
in #"Appended Query"
Now my result set have both data sets combined into one. (I can check this includes 144 columns for 144 cities; combined list of 72 cities in each of previous data sets).
Next Steps
It will be really long if I mention all steps in a single post, So stay tuned. Next step would be Grouping rows and applying some transformations to get the desired output. Finally I will show you how to use parameters in the query and automate the process.
Power Query Not for BI; Event Date and Time Scheduler – Part 2
Power Query Not for BI; Event Date and Time Scheduler – Part 3
There is something wrong with the steps detailed. “Renamed Columns” step is only done once in the post, therefore, there is no generation of the step “Renamed Columns1”. Kindly, correct me if I’ve understood the same wrongly.
Would be great if you can post the M Query Code or Excel Workbook here.
Thanks.
Hi Deepak,
Wow, I impressed by how fast you started to follow the steps. Well Done.
Regarding the Renamed Columns1 Step; That is where I’ve mentioned: Then I rename the column names for this dataset. The reason for this rename is that later on when I want to combine two queries together column names should be the same. I’ll rename Columns to be Location, Start Time, End Time. (Right before Append Queries)
As renaming was easy step to do, I just did that explanation. Sorry if it is confusing.
I’ll share the whole M script at the end of series, Stay tuned.
Cheers,
Reza
Hi Reza,
Thanks for the response. However, when I rename the columns, the step created is “Renamed Columns” and not “Renamed Columns1”, therefore, I get an error also i.e. “Location2 could not be found”. Even if I modify the query,
#”Appended Query” = Table.Combine({#”Removed Columns1″, #”Renamed Columns1″ })
to
#”Appended Query” = Table.Combine({#”Removed Columns1″, #”Renamed Columns” }),
the code throws the same error.
However, this being said, I’ll go through the complete M script and see where is the issue. Thanks.
-Deepak
Moreover, when I insert a step after “Replaced Values1”, “Removed Columns2” comes before “Removed Columns”. However, in your part 2 of the series, “Removed Columns2” comes after “Removed Columns1”, therefore, the step sequence in query is not same. Kindly, correct me if I’ve wrongly understood something. Thanks.
-Deepak
[Typo in earlier reply]
Moreover, when I insert a step after “Replaced Values1″, “Removed Columns2″ comes before “Removed Columns1”. However, in your part 2 of the series, “Removed Columns2″ comes after “Removed Columns1″, therefore, the step sequence in query is not same. Kindly, correct me if I’ve wrongly understood something. Thanks.
-Deepak
Thanks Deepak for all your detailed comments.
You are absolutely right about naming and order of some steps.
However I believe naming and order of some steps are easy for readers like you to figure it out. If after renaming, your step named as Renamed Columns, then that is absolutely fine, use that in the query. The main point to consider here in this walk through is that: we have two set of columns; first one with names; Location, Start Time, End Time (name it whatever you like, I call it first data set), and second one with names; Location2, Start Time2, and End time2. rename this second data set to: Location, Start Time, End Time. and then finally append result of first data set and the second data set.
The reason that step names and orders are a bit different in my screenshots is that I’ve built the demo first, and then I wrote and generate screenshots.
Please let me know if it is still hard to figure out, or you face an issue yet, and I’ll do my best to correct steps with right naming convention and order.
Sorry for inconvenience
Cheers,
Reza
Hi Reza,
Thanks a lot for the quick response. I’m not able to go forward because I’m getting the error mentioned earlier in my reply even after correcting the step name. The issue is, how can Removed Columns2 be after Removed Columns1 when we are inserting the step after Replaced Values1, which places Removed Columns2 before Removed Columns1 and after that I follow the steps mentioned I get the same error again. Thanks.
-Deepak
Hi Reza,
Following is my M Script after Replaced Values step:
#”Replaced Value1″ = Table.ReplaceValue(#”Replaced Value”,”*”,””,Replacer.ReplaceText,{“Location2″}),
#”Removed Columns2″ = Table.RemoveColumns(#”Replaced Value1”,{“Location”, “Start time”, “End time”}),
#”Renamed Columns” = Table.RenameColumns(#”Removed Columns2″,{{“Location2”, “Location”}, {“Start time2”, “Start time”}, {“End time2”, “End time”}}),
#”Removed Columns1″ = Table.RemoveColumns(#”Renamed Columns”,{“Location2”, “Start time2”, “End time2″}),
#”Appended Query” = Table.Combine({#”Removed Columns2″, #”Renamed Columns” })
in
#”Appended Query”
The Script is not appending the tables one below the other. If I replace “Removed Columns2” with “Removed Columns1” in Appended Query, I get an error that Location2 could not be found. I’m not able to debug the same. Request your help. Thanks.
-Deepak
Hi Deepak,
I’ve applied changes in naming, screenshots, and order of steps. Now you should be able to follow it without an issue. Let me know if you have any questions
Cheers,
Reza
Hi Reza,
This is amazingly simple now. Thanks a lot for highlight reference issue (I guess the root cause of problem).
-Deepak
Thanks Deepak for your feedback. I appreciate it.
Cheers,
Reza