Have you ever wanted to match two tables together but not on exact matches, but also on a threshold of similarity? if your answer to this question is yes, then this feature is built for you. Let’s explore in details how the fuzzy matching works in Power BI. To learn more about Power BI, read Power BI from Rookie to Rock Star.

Video

Enable the Preview Feature

At the time of writing this blog post, Fuzzy matching is a preview feature, and you have to enable it in Power BI Desktop -> Files -> Options and Settings -> Options;

In the Options window, under Preview Features, select the checkbox beside “Enable fuzzy merge”

After this step, you’ll need to close the Power BI Desktop and open it again.

Sample Dataset

for this example; I will be using a sample dataset which has two very simple tables below;

A “source” table which is the data of employees and their departments. Notice that the Department field has data quality issues. we have department values such as “Sales”, and “Sale”. Or another example is “Managmnt” and “Management”.

A “Department” table which has a list of all departments;

As you can see the list of Department Names are clean in this table, and this is the table that should be used to clean the “source” table. Now let’s see how this is possible?

Fuzzy Merge

Fuzzy Merge is a way of joining two tables together, but not on exact matching criteria, but on the similarity threshold. If you want to learn what is the Merge operation itself and the difference of that with Append, read my blog post here. If you want to learn more details about what is Merge and the different types of join or merge, read my other blog post here. Merge or Join is simply the act of combining two tables with different structures, but with link/join columns, to access columns from one of the tables in the other one.

To use Merge operation on the “source” query, You can click on the Merge Queries as New option in the Home tab of Power Query Editor window.

Then you can select the second table and choose Department as the joining field

This process will give you the output below: (result below is after expanding the merge’s column output);

You can see that the Merge operation only finds the EXACT Matching scenarios. Department “Sale” doesn’t match with the Department table, because it is missing an “S” at the end to match with the “Sales”.

Now, let’s see how Fuzzy match works here. To use the Fuzzy Merge, just select the checkbox under the Merge tables dialog box;

When you enable the fuzzy matching, then you can configure it in the “fuzzy merge operations”. you can leave everything optional. or set values. let’s first see the sample output of this operation and then see what are the options. This is the sample output of Fuzzy Merge:

You can see the three highlighted records, which was not recognized as the exact match in the normal merge operation, is not matching the output of the fuzzy merge. Fuzzy merge will check the similarity between joining fields, and if their similarity is more than the threshold configuration, it will pass it as a successful match. You can see that “Managmnt” can match with “Management” with this threshold configuration, but the “Mangmt” doesn’t, it shows that the threshold of similarity is higher than the similarity rate of these two text values with each other.

You can play with Options of Fuzzy Merge and get different outputs. Here is an explanation of these options:

Option	Acceptable Value	Description
Threshold	a value between 0.00 to 1.00	if the similarity of the two text values is more than the threshold it will be considered as a successful match. Value 1.00 means exact match.
Ignore Case	true/false	If you want the similarity algorithm to work regardless of the upper or lower case letters, then select this option.
Ignore Space	true/false	If you want the similarity algorithm to work regardless of the number of spaces in the text, then select this option.
Maximum Number of Matches	numeric positive value, between 0 to 2147483647	The number of rows that can be matched to one value.
Transformation Table	table	This is like a mapping table, let’s check it out a bit later in this post. It gives you the option to use your own mapping table. This table should have at least two columns of “To” and “From”.

Power Query Functions

In addition to the option added in the graphical interface of Power Query, we also have two Power Query Functions that do the Fuzzy Merge, Functions are:

Table.FuzzyJoin

Table.FuzzyNestedJoin

Functions above both do have the same fuzzy configurations, their only difference is that one of them gives you the expanded output (FuzzyJoin), the other one gives you the same output as the one that you see in the graphical interface with the table column output after merge (FuzzyNestedJoin). If you use these two functions directly in M script, you will have a couple of more parameters to set, which are for concurrency and culture settings.

These are parameters of the two functions above;

Transformation Table

Sometimes in the merge operation, you need a mapping table. This table is called here as Transformation Table. Here is an example of a mapping table:

Note that this table should have at least the two column of “To”, and “From”. And don’t forget that Power Query is case sensitive!

Now you can select this table in your Merge operation in the Fuzzy configuration as below;

This process is like merging “source” table, which is the first table in our Merge, with the “Department” table based on the “Department” and then “Department Name” column, then merging it with the “mapping” table, based on the “To” column and “Department Name”. The output will bring the “To” column of the mapping table. Here is the sample output:

Summary

Matching based on similarity threshold, or Fuzzy matching is a fantastic feature added to Power Query and Power BI, however, it is still a preview feature, and it may have some more configuration coming up. please try it in your dataset, and let me know if you have any questions in the comment below.

Reza Rad

Trainer, Consultant, Mentor

Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

9 thoughts on “Fuzzy Matching in Power BI and Power Query; Match based on Similarity Threshold”

Thanks for well explained blog post!

Oh that’s a great explanation ! I waited for this impatiently.
Thank you Reza

Very useful feature and good explanation

thanks for the post, and you tutorial. Do you know the math used to determine the similarity threshold? If so, I would appreciate an explanation of that as well.

Reza Rad says:

October 31, 2018 at 3:14 am

Fuzzy Matching in Power Query, uses Jaccard Index method. explained here: https://en.wikipedia.org/wiki/Jaccard_index

Loading...

Reply

Nice feature and easy to use.
Thanks for the detailed post.

Hello, Can someone help me with which algorithm does Power BI to perform fuzzy match? I would like to replicate that algorithm on the data base side. Please help me with this.
Thanks

Reza Rad says:

September 2, 2019 at 4:45 am

Sure. As I mentioned in previous comments, it is Jaccard Index
Cheers
Reza

Loading...

Reply

Nir says:

October 23, 2018 at 6:17 am

Thanks for well explained blog post!

Loading...

didier terrien says:

October 25, 2018 at 12:12 am

Oh that’s a great explanation ! I waited for this impatiently.
Thank you Reza

Loading...

Ali Sharifi says:

October 25, 2018 at 11:44 pm

Very useful feature and good explanation

Loading...

kazshak says:

October 26, 2018 at 2:25 am

thanks for the post, and you tutorial. Do you know the math used to determine the similarity threshold? If so, I would appreciate an explanation of that as well.

Loading...

- Reza Rad says:
  
  October 31, 2018 at 3:14 am
  
  Fuzzy Matching in Power Query, uses Jaccard Index method. explained here: https://en.wikipedia.org/wiki/Jaccard_index
  
  Loading...
  
LUCA says:

October 26, 2018 at 10:20 am

Nice feature and easy to use.
Thanks for the detailed post.

Loading...

Ghuiles says:

May 11, 2019 at 3:46 am

+1

Loading...

Srikanth says:

August 30, 2019 at 3:58 am

Hello, Can someone help me with which algorithm does Power BI to perform fuzzy match? I would like to replicate that algorithm on the data base side. Please help me with this.
Thanks

Loading...

- Reza Rad says:
  
  September 2, 2019 at 4:45 am
  
  Sure. As I mentioned in previous comments, it is Jaccard Index
  Cheers
  Reza
  
  Loading...