Data Wrangling in Power Query

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Before machine learning, we need to go through two main processes: Data preparation and Data wrangling. Data preparation is about the collecting data from resources, integrate and load data in one resource, that is so like ETL (extract, transform, and load process) [2].

According to [2], Data wrangling is when we loaded the data and applying the data cleaning, formatting, and integration in the application layer.

In python language, there is some popular data wrangling process such as:

Reshaping data

One of the main approaches for reshaping data is pivoting, that is about converting a column data to the column name. As you can see in figure 6-5 [3].

 

 

The related code in Python is:

Also in R, we can do this via below codes

Now let’s see how we can be done this process in Power Query without writing any code. To pivot data using Power Query, we need to enter data using “Enter Data Manually” option.

By clicking on the “Enter Data”, you able to put data manually in Power Query.

We have the same data as we have in Python. Then we need to just click on the type column, Then click on the “Pivot Colum” in Transformation tab. The result of pivoting is shown in below picture.

Then you need to specify which column you are going to put a value and type.

 

 

Merging DataSet

Another data preparation that is so popular is about merging the data. In Python, if we want to do that, there is a function named “merge”, that we can use it to connect two different datasets. For instance, we have a dataset as below:

The same process can be done in Power Query. I have created two different datasets “Data1” and “Data2”. After creating a dataset, you need to click on the “Merge” column to merge “Data1” with “Data2” . 

In merge window, we able to specify which column we are going to merge based on and what type of merge we are interested. After merging, you need to specify what columns you want to show to users. Please follow steps explained in Figure 6-19. Then the output will be there and you able to see the results

 

 

 

These are some basic data transformation that you can do in Power Query. There is some more complex process that you can do using Power Query menu or M or R languages.

 [1] http://www.informationbuilders.es/sites/www.informationbuilders.com/files/intl/co.uk/presentations/four_types_of_analytics.pdf?redir=true

[2]  https://blogs.sas.com/content/datamanagement/2016/09/06/data-prep-and-data-wrangling1/

[3] https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Pandas_Cheat_Sheet_2.pdf

[4] https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/downloads/data.csv

[5] http://radacad.com/online-book-power-bi-from-rookie-to-rockstar

 

 

 

 

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Leila Etaati
Dr. Leila Etaati is Principal Data Scientist, BI Consultant, and Speaker. She has over 10 years’ experience working with databases and software systems. She was involved in many large-scale projects for big sized companies. Leila has PhD of Information System department, University of Auckland, MS and BS in computer science. Leila is Microsoft Data Platform MVP.

Leave a Reply

Your email address will not be published. Required fields are marked *