Be Careful When Merging on Text Fields in Power BI Using Power Query

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

Merging two tables in Power Query is one of the most useful transformations to combine data from different datasets, and also create a flatten data structure. However, if you use a text field for merge, then you might see some unexpected results. Here in this post, I share some important considerations if you are using a text field for merge.

What is Merge?

Merge action in Power Query is a way to have two tables with one or more joining fields to match their records and create a flattened table.

2017-01-05_21h17_33

In the screenshot below, you can see that the two tables are merged based on the Title (in the left table), and Course (in the right table). As a result, you can see records that are matched (yellow and green), and those not matched (red). I have written about Merge and different types of Merge, which I recommend you to read to have a better understanding of it and see how it works.

Merging on the Text Field

In an ideal data model design, you merge tables based on an ID field which is numeric. However, it happens that you want to merge tables with their text fields, for those scenarios, you have to remember a very important tip:

Power Query is case sensitive. One and one won’t be matched in Power Query.

When the Merge on the Text Field Doesn’t Work

Below is an example of two tables matched using Power Query merge, and you can see that the match didn’t work as we want it;

As you can see in the screenshot above, I have two tables, one is Courses, which has two courses: Math and Physics. The other one is Registrations, which says which student signed up for which course. There is no course value in the Course column of the registration table that does not match with the Course table from our view.

I used the Full Outer join as the Join Kind which means I will get all rows matching and not matching. However, as you can see. some of the rows are not matching. Here is the result of merge:

It seems there are rows in the Registration table with Course values that do not match with the course table, and it creates blank rows (rows with null values in their columns). To expand to the columns of the second table, I can use the expand column, and here is what I get:

You can see that two rows with the course value of “math” don’t match. This is due to the fact I mentioned earlier. Power Query is case sensitive. “Math” in the Course table doesn’t match with “math” in the Registrations table.

Change to UPPER CASE or lower case Before Merge

The easy solution for this is to first change the case of the column in both tables to either UPPER or lower case. (they both to be the same). There is also an option to capitalize each word.

After doing the same transformation in both tables before the Merge, then the result should be better:

This time I got three rows matching. However, there is still one row not matching. That is because of the value in the Registration table is “MATH ” with space at the end. which doesn’t match with “MATH”.

Trimming spaces

In the text fields, trimming is most of the time helpful. Trim the text column in both tables before the merge;

After doing the trimming in both tables, my results are now all matching;

Other transformations to keep the two text fields the same

There are a few other transformations that can be helpful to make sure the two text fields are the same. Clean is another good transformation to use. Usually getting the Length of the text field is useful for spotting the differences.

These facts are not just for merging on text fields, I have explained in another article that when you want to do Remove Duplicate on text fields, you have to be aware of these too.

Keep the Original Column Intact

If you wish to keep the original column (like Course in this example) intact and don’t want it to become UPPER or lower case after the merge. then you can have a copy of that column before doing the UPPER or lower case transformation. or choose these transformations from the Add columns tab, so it creates a new column that you use for the merge, and remove it after the merge.

Consider Fuzzy Matching, But Carefully

Text values can be different, no matter what you do. Because usually they come from a data entry form, and users might enter different values, sometimes, they might enter Reza Rad, sometimes,  enter Reza Raad, and many other variations.

There is a very useful transformation in Power Query called Fuzzy Merge, which matches the rows based on their similarity, and you can adjust the threshold of that similarity. I have written about that in an article here.

When you use Fuzzy Matching, however, be careful of the number of rows in both tables before the merge. Fuzzy matching checks every row of one table against every row of the second table and checks for similarity. it can be slow, better to first do all your normal matching, and then do the fuzzy matching.

Summary

Merge in Power Query is best to be done in numeric fields. However, sometimes you don’t have any other options. If you are merging based on text fields, be mindful of doing some transformations such as UPPER or lower and Trim and Clean, and use Length to make sure the fields match with each other’s format. You can enhance the results even better with Fuzzy Matching, but better to do exact matching first to get the best performance.

Download Sample Power BI File

Download the sample Power BI report here:

Enter Your Email to download the file (required)

Video

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

2 thoughts on “Be Careful When Merging on Text Fields in Power BI Using Power Query

  • Dear agha Reza;
    thanks for the subject, that is very useful.
    May I have a question?
    I have 2 Tables:
    Table 1 –> ID, Name, col A, col B, col C
    Table 2 –> ID, Name, col A, col Y, col Z

    I need to merge these two tables and need to replace the col a from Table 2 into Table 1, the matching columns are ID and Name but it just keep giving me the error:

    Expression.Error: The field ‘col A’ already exists in the record.
    Details:
    Name=col A
    Value=

    Any help will be appreciated.
    Thank you so much
    Maryam

    • Hi Maryam
      before merge it is better to rename the Col A in table 2. this helps that you don’t get the error saying that this column exists.
      then you can use a Conditional column to generate a new column A based on the result of the col A from the two tables.
      Cheers
      Reza

Leave a Reply

%d bloggers like this: