Previously I have written a blog post explaining two ways of combining data sets with each other; Append vs Merge. In this post I want to explain in details what is the difference between all types of Merge Type and explaining how to choose the right merge (or Join) type. These Merge types are very similar to join types in relational databases. Because many of people who work with Power BI might not have experience working with relational databases, so I think this post is a good explanation in details what are these types and when to use them. If you want to learn more about Power BI; read Power BI book from Rookie to Rock Star.
Sample Data Set
Download Data set from here
What is Merge?
Combining two data sets with each other can be done in multiple ways. One of the ways of combining data sets with each other is Merging data sets. Merge is similar to Join in relational databases. Merging two data sets with each other requires some joining fields, and the result will be combined set of columns from both data sets.
Let’s go through it with an example;
Consider two data tables as below
*Data tables are sourced from this web page. Download it from the link at the top of this post.
Merging these two tables with each other, gives you a data set with combined set of columns like below;
For merging data sets with each other, you need to have some joining fields. In this example joining field is customer_id. To get a data set that includes all columns from both tables based on customer_id relationship, this is how you can join tables to each other:
How to Merge Queries
Select the first query (for this example: Orders), and then from home tab, Merge queries. there are two options there:
- Merge Queries: This will amend the existing query (orders), to be the result of Merging.
- Merge Queries as New: This will not change the existing query. It will create a reference from it, and the result of merging would be another query.
Select Merge Queries as New.
In the Merge configuration window, select the second table (Customers), then select joining field in each table (Customer_id). You will also see number of matching rows as extra information there.
Result of this operation will be a new query named as Merge1, which has the combined result of these two queries.
The table in Customers column is a sub table from customer table for records matching with that customer_id. You can then expand it to columns you want;
and the final result will be now all columns in one query;
This process is called Merge Join. However, there are some configurations you can do for this;
Merging Based on Multiple Columns
You can easily use multiple columns for merging two data sets. just select them in an order with holding Ctrl key of the keyboard.
This method can be really helpful, because the relationship tab in Power BI Desktop doesn’t allow you to create relationship based on multiple columns. In Power Query however, you can create the merge and create a unique field for the relationship in Power BI Desktop. Here is my blog post about it.
In addition to the merging column (or joining field); the type of Merge is very important. you can get totally different result set with choosing different type of merge. Here is where you can see the Join Kind (or Merge Type);
At the moment of writing this blog post, there are 6 types of joins. About a year ago or even before that fewer number of joins were available. However you could always choose your Join type in Power Query M script.
Every one of these joins get a different result set, let’s see what is their difference.
Left and Right
To start, you need to know the concept of Left and Right tables (or queries). When you merge two data sets with each other, the first query is considered as LEFT and the second as RIGHT.
In example above; Orders is LEFT query, and Customers is the RIGHT query. you can change them if you want of course. understanding this is imporatnt, because most of Join Kinds works with the concept of left or right or both.
Left Outer (All from first, matching from second)
The first type of Join/Merge is Left Outer. This means LEFT query is the important one. All records from this query (LEFT or FIRST) will be showed in the result set plus their matching rows in the right (or second table). This type of join is the default type. If you don’t specify the Join Kind it will be always Left Outer.
For example in the first Merge example we have done, you can see that the result set is 4 records, representing 4 records from the left table (orders), and their matching rows in the customer table.
As you can see in the screenshot above; there are two customers which won’t be in the result set. Customers with id 4 and 5. because these rows are not matching with customer_id field in the orders table. In LEFT Outer merge, only records from Left table with matching rows of the right table will be selected.
Right Outer (all rows from second, matching from first)
Sometimes you need to fetch all rows from second table, regardless if they exists in the first table or not. In that case, you would need to use another type of Join called Right Outer. With this type of Join you get all rows from the RIGHT (or second) table, with their matching rows from left (or first table). Here is an example:
As you can see in above screenshot; all rows from customers table is showed in the result set, however only 4 rows of that is matching with the orders table. if there is a record in orders table that doesn’t match it will come as Null (two red rows in the result set).
Full Outer (all rows from both)
This type of join/merge is normally the safest among all. this will return all rows from both tables (matching and non-matching). you will have all rows from first table, and all rows from the second table, and all matching rows. with this method you won’t loose any records.
This result set as you can see is 7 rows. 4 rows matching in both tables. 2 rows only in customer table, but not in orders table. 1 row only in orders table, but not in customers table.
Inner (only matching rows)
This method only selects matching rows. You will not have any record with null values (because these records generate as a result of not matching). Here is an example;
There are only 4 rows that are matching between two tables. rows with customer_id 1, 2, and 3. all not matching rows will be excluded from the result set.
Left Anti (rows only in first)
If you are only interested in rows from the LEFT (first) table, then this is the option to select. This means rows that are in the first table and DO NOT match with the second table. So, only Not matching rows from the first table. With Anti options you always get null for the second data set, because these rows doesn’t exists there. Anti options are good for finding rows that exists in one table but not in the other one. Here is an example:
This will find the only one row that exists in the Orders table and does not match with any of rows in the customers table.
Right Anti (rows only in second)
Similar to Left Anti; this method will give you only not matching rows. However, this time from the second (Right) table. You can find out what rows in the right table are not matching with the left table. Here is the example;
Different Join Kinds in Merge returns different result set. make sure to select the right join kind to avoid any issues later. There are six types of joins as below;
- Left Outer: Rows from left table and matching with the right
- Right Outer: Rows from right table and matching with the left
- Full Outer: Rows from both tables (matching or not matching)
- Inner: Only matching rows from both tables
- Left Anti: Not matching rows from left table
- Right Anti: Not matching rows from right table
Sample Data Set
Download Data set from here