Choose the Right Merge Join Type in Power BI

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2017-07-26_10h29_00

Previously I have written a blog post explaining two ways of combining data sets with each other; Append vs Merge. In this post I want to explain in details what is the difference between all types of Merge Type and explaining how to choose the right merge (or Join) type. These Merge types are very similar to join types in relational databases. Because many of people who work with Power BI might not have experience working with relational databases, so I think this post is a good explanation in details what are these types and when to use them. If you want to learn more about Power BI; read Power BI book from Rookie to Rock Star.

Sample Data Set

Download Data set from here


Enter Your Email to download the file (required)

What is Merge?

Combining two data sets with each other can be done in multiple ways. One of the ways of combining data sets with each other is Merging data sets. Merge is similar to Join in relational databases. Merging two data sets with each other requires some joining fields, and the result will be combined set of columns from both data sets.

2017-07-26_09h51_56

Let’s go through it with an example;

Consider two data tables as below
*Data tables are sourced from this web page. Download it from the link at the top of this post.

Customers Table:

2017-07-25_22h37_14

Orders Table:

2017-07-25_22h41_42

Merging these two tables with each other, gives you a data set with combined set of columns like below;

2017-07-25_22h58_35

For merging data sets with each other, you need to have some joining fields. In this example joining field is customer_id. To get a data set that includes all columns from both tables based on customer_id relationship, this is how you can join tables to each other:

How to Merge Queries

Select the first query (for this example: Orders), and then from home tab, Merge queries. there are two options there:

  • Merge Queries: This will amend the existing query (orders), to be the result of Merging.
  • Merge Queries as New: This will not change the existing query. It will create a reference from it, and the result of merging would be another query.

Select Merge Queries as New.

2017-07-26_09h21_46

In the Merge configuration window, select the second table (Customers), then select joining field in each table (Customer_id). You will also see number of matching rows as extra information there.

2017-07-26_09h23_20

Result of this operation will be a new query named as Merge1, which has the combined result of these two queries.

2017-07-26_09h28_42

The table in Customers column is a sub table from customer table for records matching with that customer_id. You can then expand it to columns you want;

2017-07-26_09h30_10

and the final result will be now all columns in one query;

2017-07-26_09h31_04

This process is called Merge Join. However, there are some configurations you can do for this;

Merging Based on Multiple Columns

You can easily use multiple columns for merging two data sets. just select them in an order with holding Ctrl key of the keyboard.

2017-07-26_09h34_45

This method can be really helpful, because the relationship tab in Power BI Desktop doesn’t allow you to create relationship based on multiple columns. In Power Query however, you can create the merge and create a unique field for the relationship in Power BI Desktop. Here is my blog post about it.

Merge Types

In addition to the merging column (or joining field); the type of Merge is very important. you can get totally different result set with choosing different type of merge. Here is where you can see the Join Kind (or Merge Type);

2017-07-26_09h37_41

At the moment of writing this blog post, there are 6 types of joins. About a year ago or even before that fewer number of joins were available. However you could always choose your Join type in Power Query M script.

Every one of these joins get a different result set, let’s see what is their difference.

Left and Right

To start, you need to know the concept of Left and Right tables (or queries). When you merge two data sets with each other, the first query is considered as LEFT and the second as RIGHT.

2017-07-26_09h40_32

In example above; Orders is LEFT query, and Customers is the RIGHT query. you can change them if you want of course. understanding this is imporatnt, because most of Join Kinds works with the concept of left or right or both.

Left Outer (All from first, matching from second)

The first type of Join/Merge is Left Outer. This means LEFT query is the important one. All records from this query (LEFT or FIRST) will be showed in the result set plus their matching rows in the right (or second table).  This type of join is the default type. If you don’t specify the Join Kind it will be always Left Outer.

For example in the first Merge example we have done, you can see that the result set is 4 records, representing 4 records from the left table (orders), and their matching rows in the customer table.

2017-07-26_11h13_43

As you can see in the screenshot above; there are two customers which won’t be in the result set. Customers with id 4 and 5. because these rows are not matching with customer_id field in the orders table. In LEFT Outer merge, only records from Left table with matching rows of the right table will be selected.

Right Outer (all rows from second, matching from first)

Sometimes you need to fetch all rows from second table, regardless if they exists in the first table or not. In that case, you would need to use another type of Join called Right Outer. With this type of Join you get all rows from the RIGHT (or second) table, with their matching rows from left (or first table). Here is an example:

2017-07-26_10h20_27

As you can see in above screenshot; all rows from customers table is showed in the result set, however only 4 rows of that is matching with the orders table. if there is a record in orders table that doesn’t match it will come as Null (two red rows in the result set).

Full Outer (all rows from both)

This type of join/merge is normally the safest among all. this will return all rows from both tables (matching and non-matching). you will have all rows from first table, and all rows from the second table, and all matching rows. with this method you won’t loose any records.

2017-07-26_10h29_00

This result set as you can see is 7 rows. 4 rows matching in both tables. 2 rows only in customer table, but not in orders table. 1 row only in orders table, but not in customers table.

Inner (only matching rows)

This method only selects matching rows. You will not have any record with null values (because these records generate as a result of not matching). Here is an example;

2017-07-26_10h44_11

There are only 4 rows that are matching between two tables. rows with customer_id 1, 2, and 3. all not matching rows will be excluded from the result set.

Let Anti (rows only in first)

If you are only interested in rows from the LEFT (first) table, then this is the option to select. This means rows that are in the first table and DO NOT match with the second table. So, only Not matching rows from the first table. With Anti options you always get null for the second data set, because these rows doesn’t exists there. Anti options are good for finding rows that exists in one table but not in the other one. Here is an example:

2017-07-26_10h52_21

This will find the only one row that exists in the Orders table and does not match with any of rows in the customers table.

Right Anti (rows only in second)

Similar to Left Anti; this method will give you only not matching rows. However, this time from the second (Right) table. You can find out what rows in the right table are not matching with the left table. Here is the example;

2017-07-26_11h00_29 This result set is all rows from the customer table (right table) that does NOT match with orders (first table).

Summary

Different Join Kinds in Merge returns different result set. make sure to select the right join kind to avoid any issues later. There are six types of joins as below;

  • Left Outer: Rows from left table and matching with the right
  • Right Outer: Rows from right table and matching with the left
  • Full Outer: Rows from both tables (matching or not matching)
  • Inner: Only matching rows from both tables
  • Left Anti: Not matching rows from left table
  • Right Anti: Not matching rows from right table

2017-07-26_11h10_08

Sample Data Set

Download Data set from here


Enter Your Email to download the file (required)

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad

Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.


Leave a Reply

Your email address will not be published. Required fields are marked *