Merging two tables in Power Query is one of the most useful transformations to combine data from different datasets, and also create a flatten data structure. However, if you use a text field for merge, then you might see some unexpected results. Here in this post, I share some important considerations if you are using a text field for merge.
What is Merge?
Merge action in Power Query is a way to have two tables with one or more joining fields to match their records and create a flattened table.
In the screenshot below, you can see that the two tables are merged based on the Title (in the left table), and Course (in the right table). As a result, you can see records that are matched (yellow and green), and those not matched (red). I have written about Merge and different types of Merge, which I recommend you to read to have a better understanding of it and see how it works.
Merging on the Text Field
In an ideal data model design, you merge tables based on an ID field which is numeric. However, it happens that you want to merge tables with their text fields, for those scenarios, you have to remember a very important tip:
Power Query is case sensitive. One and one won’t be matched in Power Query.
When the Merge on the Text Field Doesn’t Work
Below is an example of two tables matched using Power Query merge, and you can see that the match didn’t work as we want it;
As you can see in the screenshot above, I have two tables, one is Courses, which has two courses: Math and Physics. The other one is Registrations, which says which student signed up for which course. There is no course value in the Course column of the registration table that does not match with the Course table from our view.
I used the Full Outer join as the Join Kind which means I will get all rows matching and not matching. However, as you can see. some of the rows are not matching. Here is the result of merge:
It seems there are rows in the Registration table with Course values that do not match with the course table, and it creates blank rows (rows with null values in their columns). To expand to the columns of the second table, I can use the expand column, and here is what I get:
You can see that two rows with the course value of “math” don’t match. This is due to the fact I mentioned earlier. Power Query is case sensitive. “Math” in the Course table doesn’t match with “math” in the Registrations table.
Change to UPPER CASE or lower case Before Merge
The easy solution for this is to first change the case of the column in both tables to either UPPER or lower case. (they both to be the same). There is also an option to capitalize each word.
After doing the same transformation in both tables before the Merge, then the result should be better:
This time I got three rows matching. However, there is still one row not matching. That is because of the value in the Registration table is “MATH ” with space at the end. which doesn’t match with “MATH”.
In the text fields, trimming is most of the time helpful. Trim the text column in both tables before the merge;
After doing the trimming in both tables, my results are now all matching;
Other transformations to keep the two text fields the same
There are a few other transformations that can be helpful to make sure the two text fields are the same. Clean is another good transformation to use. Usually getting the Length of the text field is useful for spotting the differences.
These facts are not just for merging on text fields, I have explained in another article that when you want to do Remove Duplicate on text fields, you have to be aware of these too.
Keep the Original Column Intact
If you wish to keep the original column (like Course in this example) intact and don’t want it to become UPPER or lower case after the merge. then you can have a copy of that column before doing the UPPER or lower case transformation. or choose these transformations from the Add columns tab, so it creates a new column that you use for the merge, and remove it after the merge.
Consider Fuzzy Matching, But Carefully
Text values can be different, no matter what you do. Because usually they come from a data entry form, and users might enter different values, sometimes, they might enter Reza Rad, sometimes, enter Reza Raad, and many other variations.
There is a very useful transformation in Power Query called Fuzzy Merge, which matches the rows based on their similarity, and you can adjust the threshold of that similarity. I have written about that in an article here.
When you use Fuzzy Matching, however, be careful of the number of rows in both tables before the merge. Fuzzy matching checks every row of one table against every row of the second table and checks for similarity. it can be slow, better to first do all your normal matching, and then do the fuzzy matching.
Merge in Power Query is best to be done in numeric fields. However, sometimes you don’t have any other options. If you are merging based on text fields, be mindful of doing some transformations such as UPPER or lower and Trim and Clean, and use Length to make sure the fields match with each other’s format. You can enhance the results even better with Fuzzy Matching, but better to do exact matching first to get the best performance.
Download Sample Power BI File
Download the sample Power BI report here: