Remove Duplicate Doesn’t Work in Power Query for Power BI? Here is the Solution!

One of the most common transformations in Power Query is the Remove Duplicates. This transformation is used in many scenarios, one of the examples, is to create a dimension table with unique IDs in it, so can be used as the source of a one to many relationships to fact tables in the relationship diagram of Power BI. Remove duplicate will give you the unique output, however, sometimes this won’t function as you expect. Sometimes, you apply a remove duplicate, but there are still duplicates! how to fix that? read the rest of this article to learn about it. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.

How does Remove Duplicate work?

Remove duplicate is a transformation in Power Query, which will remove duplicate entries in the selected column(s). As an example, if you consider a dataset such as the below image:

If we use the Remove Duplicate transformation on this column, as the below screenshot shows;

Then we will have the unique list of countries as the output as below;

As you can see in the above screenshot, Remove Duplicate is an easy way to achieve the unique list of values in Power Query.

Remove Duplicate Helps in Modeling

Now, let’s have a look at another example. Let’s say we have two tables; one table for Product, and one for Sales. Here are entries of the Product table;

and the Sales Table as below;

If we load these two tables as is into Power BI and try to create a relationship between them, we get this:

We get the error saying “This relationship has cardinality Man-Many. this should only be used if it is expected that neither column (Product ID and Product ID) contains unique values, and that the significantly different behavior of Many-many relationships is understood”.

If you haven’t used many to many relationships in Power BI previously, I’d recommend to not use it from now on too! this type of relationship is also called Weak relationship because it implies modeling issues. I suggest using one to many and many to one relationship instead. In our case, we need the relationship to be one-to-many from the Product table to the Sales table, because the Product table should have the UNIQUE list of products in it, not duplicate entries.

When the Remove Duplicate does not Work!

So the above example shows that we need to use the Remove Duplicate to get rid of duplicate entries in the product table.  which is what you can do in Power Query Editor as below; right-click on the Product ID column and select Remove Duplicates;

After this action, everything should be right, but this time you would get the same error also!

It seems there are still duplicate entries in the product table. when you look at the table, you will see that Yes, there are duplicate entries, despite the fact that we said Remove Duplicate!

Solution: Be cautious of Case SeNsItIvItY!

Power Query is a case sensitive language. the two values of 324Abd and 324abd are considered as two different values as a result. This has nothing to do with Remove Duplicate, anywhere in Power Query, these two values are considered as two different values. If you are going to do remove duplicates despite their case of letters, then you have to apply a transformation to change them all to one case; either UPPERCASE or lowercase.

After getting this done, then you can do the Remove Duplicate, and this time, this would be the output:

So as a result, you can create relationship easily;

Other Scenarios? Characters that you don’t see!

case sensitivity is not the only issue when it comes to removing duplicates. some characters that you don’t see can be also an issue. For example; space characters at the end of a text! Event with changing it to UPPERCASE, you would still have the extra space, which makes the two texts different. Let’s say the above example works for now, but later on, when I refresh the model, I get an error like this:

As you can see, in the error message, it says product ID 324ABD has duplicate entries. and here is what we got!

The reason, in this case, is that the last entry has space at the end. you can see it when I take the length of this column;

so these two values are different. You have to use TRIM transformation to get rid of any extra spaces at the beginning or end of text values.

Now, this time, we would have a unique list again:

Caution: Remove Duplicate for Text Values

You may not have faced any of these issues so far in your model, mainly because you do remove duplicate on numeric fields. In a properly designed data warehouse, you will deal with numeric IDs mostly, and that is why you won’t face these issues often. However, if you are doing data transformations in Power Query, then you will face issues like this. In this post, I mentioned only two of scenarios that happens like that. However, there are other cautions when you are removing duplicate on text values;

  1. use TRIM transformation to make sure there are no extra spaces at the beginning or end of text values
  2. use CLEAN transformation to make sure there are no characters that won’t see in the text values
  3. use UPPERCASE or lowercase transformation to make sure you do remove duplicate despite the case sensitivity

Here is an example of steps that I apply before removing duplicates;

You can also create a custom function including all these steps, and use it in your modeling practices.

Summary

Remove Duplicate transformation has no issues, and it simply does the job you asked for; removing duplicates. However, sometimes you have extra characters (such as space at the end of the text) which you need to remove before using the Remove Duplicate. Power Query is a case sensitive language, you need to make sure you use uppercase or lowercase transformations if you want to remove duplicate despite the case sensitivity. Have you faced any of these kind of issues? please write your scenario down in the comments

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

22 thoughts on “Remove Duplicate Doesn’t Work in Power Query for Power BI? Here is the Solution!

  • Good article. I have unfortunately tripped on all of these issues in the past.
    One additional one I found was dealing with the presence of a ZWSP (zero white space) unicode character. ZWSP is invisible by visual inspection and is not removed with Trim or Clean.

    My solution was to explicitly remove it: Text.Replace([WithZWSP], “#(200B)”, “”)

    • Hi Dale.
      Thanks for mentioning it. I haven’t ever worked with a dataset that has ZWSP in it. seems useful trick. Thanks for letting me know.

      Cheers
      Reza

  • Instead of using UPPERCASE you could use “Comparer.OrdinalIgnoreCase” as an (optional) second parameter in Table.Distinct()

  • Dear Reza

    I have managed to make a calendar from Power Query with all the columns
    I have linked the calendar table as a dimension table to 2 fact tables.
    But when I want to plan the number of case from the fact tables per month nothing shows up.
    Could you please provide more insight on this
    Best regards
    Biey

    • Do you have DATE values in your fact table? or Month values? I cannot answer much in details unless I see the model and part of the data.

  • Dear Reza,

    Very nice article! Thanks for everything your doing on Power BI.
    This article leaves me with one question.
    You use the Trim-Clean-Uppercase function for the ProductID in the products table. After that you remove the duplicates from the Product Table.
    Do I also need to perform the Trim-Clean-Uppercase action in the sales table for the Product ID? This so the fields can be matched? Or isn’t this Case-Sensitive?

    I would to hear from you soon.

    Best Regards,
    Peter

    • Hi Peter
      Yes, You do need to do trim/clean in both. The case sensitivity doesn’t apply after loading data into Power BI, and the relationship would work fine for that, but I do recommend doing that also to keep it consistent.

      Cheers
      Reza

  • Hi! I have a question about Table.Distinct behavior and can’t find an answer anywhere.
    Say, I have a table {{1,1},{1,2}}. I want to remove duplicates in the first column. The output I get is a table {1, 1}.
    The same output is if the original table is {{1,2},{1,1}}. I’d expect it to be {1,2}

    Is there a way to control which rows are removed after Table.Distinct?

    • Hi
      When you use the Distinct, it is important which column(s) you select. If you want the distinct row across all columns, then you have to select all columns first, and then use Distinct.
      Cheers
      Reza

  • Hi Reza,
    I’m not able to remove duplicates of a DateTime column, even though the records should have the same Date Time. They differ on milliseconds, but they are the same row!
    The use case is a query that fetches the 50 last played songs. When i refresh and another song gets added, the date times of the other ones get changed… so i can’t remove duplicate entries.
    Do you know any workaround?

    Cheers.

    • If the time part of that is not important for you, you can just get the date only part of it, and then remove duplicate. When you click on a datetime field, under transform, under date, there is a Date Only transformation to select.
      Cheers
      Reza

  • How come I’ve got this erro without any difference between duplicated keys? Turns out ‘lower case’ saved my day, but this one got me scratching my head! haha Thank you so much for this post!

  • It’s always the simple things! After hours wasted, I came across your post. Tried so hard to understand what could be the discrepancy – it’s so hard when it all displays fine in Excel itself. Solution was case sensitivity. Thank you, this post was easy to understand and will save people many hours of work.

Leave a Reply