Choose the Right Merge Join Type in Power BI

2017-07-26_10h29_00

Previously I have written a blog post explaining two ways of combining data sets with each other; Append vs Merge. In this post I want to explain in details what is the difference between all types of Merge Type and explaining how to choose the right merge (or Join) type. These Merge types are very similar to join types in relational databases. Because many of people who work with Power BI might not have experience working with relational databases, so I think this post is a good explanation in details what are these types and when to use them. If you want to learn more about Power BI; read Power BI book from Rookie to Rock Star.

Sample Data Set

Download Data set from here

    Enter Your Email to download the file (required)

    What is Merge?

    Combining two data sets with each other can be done in multiple ways. One of the ways of combining data sets with each other is Merging data sets. Merge is similar to Join in relational databases. Merging two data sets with each other requires some joining fields, and the result will be combined set of columns from both data sets.

    2017-07-26_09h51_56

    Let’s go through it with an example;

    Consider two data tables as below
    *Data tables are sourced from this web page. Download it from the link at the top of this post.

    Customers Table:

    2017-07-25_22h37_14

    Orders Table:

    2017-07-25_22h41_42

    Merging these two tables with each other, gives you a data set with combined set of columns like below;

    2017-07-25_22h58_35

    For merging data sets with each other, you need to have some joining fields. In this example joining field is customer_id. To get a data set that includes all columns from both tables based on customer_id relationship, this is how you can join tables to each other:

    How to Merge Queries

    Select the first query (for this example: Orders), and then from home tab, Merge queries. there are two options there:

    • Merge Queries: This will amend the existing query (orders), to be the result of Merging.
    • Merge Queries as New: This will not change the existing query. It will create a reference from it, and the result of merging would be another query.

    Select Merge Queries as New.

    2017-07-26_09h21_46

    In the Merge configuration window, select the second table (Customers), then select joining field in each table (Customer_id). You will also see number of matching rows as extra information there.

    2017-07-26_09h23_20

    Result of this operation will be a new query named as Merge1, which has the combined result of these two queries.

    2017-07-26_09h28_42

    The table in Customers column is a sub table from customer table for records matching with that customer_id. You can then expand it to columns you want;

    2017-07-26_09h30_10

    and the final result will be now all columns in one query;

    2017-07-26_09h31_04

    This process is called Merge Join. However, there are some configurations you can do for this;

    Merging Based on Multiple Columns

    You can easily use multiple columns for merging two data sets. just select them in an order with holding Ctrl key of the keyboard.

    2017-07-26_09h34_45

    This method can be really helpful, because the relationship tab in Power BI Desktop doesn’t allow you to create relationship based on multiple columns. In Power Query however, you can create the merge and create a unique field for the relationship in Power BI Desktop. Here is my blog post about it.

    Merge Types

    In addition to the merging column (or joining field); the type of Merge is very important. you can get totally different result set with choosing different type of merge. Here is where you can see the Join Kind (or Merge Type);

    2017-07-26_09h37_41

    At the moment of writing this blog post, there are 6 types of joins. About a year ago or even before that fewer number of joins were available. However you could always choose your Join type in Power Query M script.

    Every one of these joins get a different result set, let’s see what is their difference.

    Left and Right

    To start, you need to know the concept of Left and Right tables (or queries). When you merge two data sets with each other, the first query is considered as LEFT and the second as RIGHT.

    2017-07-26_09h40_32

    In example above; Orders is LEFT query, and Customers is the RIGHT query. you can change them if you want of course. understanding this is imporatnt, because most of Join Kinds works with the concept of left or right or both.

    Left Outer (All from first, matching from second)

    The first type of Join/Merge is Left Outer. This means LEFT query is the important one. All records from this query (LEFT or FIRST) will be showed in the result set plus their matching rows in the right (or second table).  This type of join is the default type. If you don’t specify the Join Kind it will be always Left Outer.

    For example in the first Merge example we have done, you can see that the result set is 4 records, representing 4 records from the left table (orders), and their matching rows in the customer table.

    2017-07-26_11h13_43

    As you can see in the screenshot above; there are two customers which won’t be in the result set. Customers with id 4 and 5. because these rows are not matching with customer_id field in the orders table. In LEFT Outer merge, only records from Left table with matching rows of the right table will be selected.

    Right Outer (all rows from second, matching from first)

    Sometimes you need to fetch all rows from second table, regardless if they exists in the first table or not. In that case, you would need to use another type of Join called Right Outer. With this type of Join you get all rows from the RIGHT (or second) table, with their matching rows from left (or first table). Here is an example:

    2017-07-26_10h20_27

    As you can see in above screenshot; all rows from customers table is showed in the result set, however only 4 rows of that is matching with the orders table. if there is a record in orders table that doesn’t match it will come as Null (two red rows in the result set).

    Full Outer (all rows from both)

    This type of join/merge is normally the safest among all. this will return all rows from both tables (matching and non-matching). you will have all rows from first table, and all rows from the second table, and all matching rows. with this method you won’t loose any records.

    2017-07-26_10h29_00

    This result set as you can see is 7 rows. 4 rows matching in both tables. 2 rows only in customer table, but not in orders table. 1 row only in orders table, but not in customers table.

    Inner (only matching rows)

    This method only selects matching rows. You will not have any record with null values (because these records generate as a result of not matching). Here is an example;

    2017-07-26_10h44_11

    There are only 4 rows that are matching between two tables. rows with customer_id 1, 2, and 3. all not matching rows will be excluded from the result set.

    Left Anti (rows only in first)

    If you are only interested in rows from the LEFT (first) table, then this is the option to select. This means rows that are in the first table and DO NOT match with the second table. So, only Not matching rows from the first table. With Anti options you always get null for the second data set, because these rows doesn’t exists there. Anti options are good for finding rows that exists in one table but not in the other one. Here is an example:

    2017-07-26_10h52_21

    This will find the only one row that exists in the Orders table and does not match with any of rows in the customers table.

    Right Anti (rows only in second)

    Similar to Left Anti; this method will give you only not matching rows. However, this time from the second (Right) table. You can find out what rows in the right table are not matching with the left table. Here is the example;

    2017-07-26_11h00_29 This result set is all rows from the customer table (right table) that does NOT match with orders (first table).

    Summary

    Different Join Kinds in Merge returns different result set. make sure to select the right join kind to avoid any issues later. There are six types of joins as below;

    • Left Outer: Rows from left table and matching with the right
    • Right Outer: Rows from right table and matching with the left
    • Full Outer: Rows from both tables (matching or not matching)
    • Inner: Only matching rows from both tables
    • Left Anti: Not matching rows from left table
    • Right Anti: Not matching rows from right table

    2017-07-26_11h10_08

    Sample Data Set

    Download Data set from here

      Enter Your Email to download the file (required)

      Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
      Reza Rad
      Trainer, Consultant, Mentor
      Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
      Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
      He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
      Reza’s passion is to help you find the best data solution, he is Data enthusiast.
      His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

      7 thoughts on “Choose the Right Merge Join Type in Power BI

      • hmm wondering if commenting works at all on this site, let me try to submit again:

        Hey Reza, thank you for the article ! … could you please explain briefly why “In LEFT Outer merge, only records from Left table with matching rows of the right table will be selected.” Doesn’t it mean that it is the same as the Inner type (and if so, your summary is wrong because it’s not actually all rows from LEFT…. )
        It seems that if I really want all rows from my LEFT table, I need to reverse the order so my original LEFT is the RIGHT and choose the option RIGHT Outer.
        Thank you!

        • Hi Istvan.
          Left JOIN (or Left Outer Join) means all rows from the left table, plus rows that exist in both left and right table
          Inner join means only rows that exist in both left and right table
          A left join will include the inner join in it, but it is a larger dataset if you have unmatching rows.
          So if you want all rows from the left table, Left Join is the choice you should make
          Cheers
          Reza

        • You can, but you have to do two tables at a time. merge table1 and table2 based on columnX, and then merge the result of that with table based on columnX …
          Cheers
          Reza

      Leave a Reply