Warning! Misleading Power Query Filtering

2016-12-15_10h34_48

Filtering in Power Query component of Power BI is easy, however it can be misleading very easily as well. I have seen the usage of this filtering inappropriately in many cases. Many people simply believe in what they see, rather than seeing behind the scenes. Power Query filtering is totally different when you do basic filtering or advanced filtering, and the result of filtering will be different too. In this post I’ll show you through a very simple example how misleading it can be and what it the correct way to do filtering in Power Query. This post is a must read for everyone who use filtering in Power Query. If you like to learn more about Power BI, read the Power BI online book from Rookie to Rock Star.

Prerequisite

For this example I’ll use AdventureWorksDW SQL Server database example. only one table which is DimCustomer.

Filtering in Power Query

Filtering rows is one of the most basic requirements in a data preparation tool such as Power Query, and Power Query do it strongly with an Excel like behavior for filtering. To see what it looks like open Power BI Desktop (or Excel), and Get Data from the database and table above; DimCustomer, Click on Edit to open Query Editor window.

In the Query Editor window there is a button on top right hand side of every column that gives you access to filtering rows

2016-12-15_10h23_11

You can simply filter values in the search box provided.

Basic Filtering

For basic filtering all you need to do is typing in the text, number, or date you want in the search box and find values to select from the list. For example if you want to filter all data rows to be only for the FirstName of David, you can simply type it and select and click on OK.

2016-12-15_10h26_35

Result then will be only records with David as first name.

2016-12-15_10h27_15

Exactly as you expect. Now let’s see when it is misleading.

Misleading Behavior of Basic Filtering

If you want to do equity filtering, basic filtering is great. For example you only want to pick David, and that’s what we’ve done in above example. However what if you want to do similarity filtering? For example, let’s say you are interested in FirstNames that has three characters of “Dan” in it. You can do it in the same search box of basic filtering, and this will give you list of all first names that has “Dan” in it.

2016-12-15_10h32_04

And then you can click on OK. the result will be showing you all first names with “Dan” characters in it. So everything looks exactly as expected, right?

2016-12-15_10h33_18

However as a Power Query geek I always look at the M script of each step to see what is happening behind the scene. For this case, here is the Power Query M script;

2016-12-15_10h34_48

The script tells the whole story. Despite the fact that you typed in “Dan” and Power Query showed you all FirstNames that has “Dan” in it. the script still use equity filters for every individual FirstName. For this data set there won’t be any issue obviously, because all FirstNames with “Dan” is already selected. However if new data rows coming in to this table in the future, and they have records with FirstNames that are not one of these values, for example Dandy, it won’t be picked! As a result the filter won’t work exactly as you expect. That’s why I say this is misleading.

Advanced Filtering: Correct Filtering

Basic filtering is very handy with that search box and suggestions of items while you typing, but it is misleading easily as you have seen above, and can’t give you what you expect in many real-world scenarios. Advanced Filtering is simply what you can see on top of the Search box, which depends on the data type of column can be Text Filters, Number Filters, Date Filters and etc. If we want to do the same “Dan” filter here, I can go to Text Filtering and select Contains.

2016-12-15_10h48_39

You can even choose filters such as Begins With or Ends With or many other choices. For this example I’ll select Contains, and type in Dan in the text box of Contains.

2016-12-15_10h51_04

After clicking on OK, you will see same result set (like the time that you did basic filtering with “Dan”), however the M script this time is different;

2016-12-15_10h52_14

You can see the big different now, This time M script used Text.Contains function. This function will pick every new values with “Dan” in the FirstName. There will be no values missing in this way. You will be amazed how many Power Query scenarios are not working correctly because of this very simple reason. There are advanced filters for all types of data types.

Summary

Basic Filtering is good only if you want to do equity filtering for values that exists in the current data set, however it won’t work correctly if you want to check ranges, or contains or things that is not an exact equity filter. Advanced Filtering is the correct way of filtering in Power Query, and there are advanced filters for all types of data types; Numbers, Text, Date….  This is very simple fact, but not considering that will bring lots of unwanted behavior and incorrect insight to your Power BI solution.

Save

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

16 thoughts on “Warning! Misleading Power Query Filtering

  • Great blog post Reza, and I would even go as far as to say that I now always used the Advanced Filter.

    It gives you a lot more options, as well as the ability to use a parameter as part of the filter criteria. Which then makes it very powerful as well as dynamic in terms of filtering your data.

  • Thanks Reza,
    For such a nice article and just got the idea of basic filter misleading otherwise would be trapped for a long.

  • Reza, this great blog could be even better if it accounted for case sensitivity of the Text.Contains function.

    • Correct. Everything in Power Query is case sensitive. you need to use Lower or Upper functions to do comparison in most of the cases.
      Cheers
      Reza

      • Reza,
        I actually had Comparer.OrdinalIgnoreCase in mind, but Text.Contains(Text.Lower([FirstName]), “dan”) is also a viable option here.
        Anyway, the point is that a straightforward Text.Contains against “dan” is not equivalent to the original Basic Filter even for the original dataset, missing ‘Dana’, ‘Danny’, etc.
        Missed a bracket in the previous comment attempt (

        • Exactly. only Text.Contains needs the proper case to work with. usually Text.Contains works with Text.Lower or Text.Upper to get consistent result set.
          Thanks for your comment 🙂
          Cheers
          Reza

  • Hi Sir
    I would like to filer a table based on a column which contains numbers from 0 to 100 or so. I want leave the rows between number 13 to 24 or any other range. What is the M-Code for this
    Thanks in advance.

      • Thank you sir for your prompt reply,
        May be I have poorly worded my problem. The rows with those numbers [13, 14 15, …] should not appear in the resultant table. [I can uncheck those numbers but I have to repeat for every new problem.
        Thank you in adavnce

        • Hi Steve,
          That is what I mentioned actually. You can use Filter by not selecting numbers, by using “Number Filter” right under “Remove Empty” in your menu options of that column, then choose “is greater than or equal to”. in the editor that comes up, you can put “is less than or equal to” 12, OR “is greater than or equal to” 16 or whatever value you want. This will write the M code for you. and you can then copy it from advanced editor window or Formula bar. it will be similar to this code:
          Table.SelectRows(#”Changed Type”, each [ProductKey] <= 12 or [ProductKey] >= 16)

          Cheers
          Reza

  • Hi Reza,
    Great article! I definitely use the advanced filter options. I have noted that at times that when I have a list of more than 1000 rows it does not do the filtering on my entire list…
    = Table.SelectRows(#”Filtered Rows”, each not Text.Contains([Name], “service”) or not Text.Contains([Name], “taken”))
    I have checked this by referencing a single subset and refreshing the data yet it did not pick it up the original filtering… I deleted the filter and re-inserted it and thus worked.

    Any suggestions or ideas as to how to ensure this for the entire data set?

    • Hi Sylvian.
      The nature of Power Query Editor is to only show you PREVIEW of the data, which usually is 1000 rows (sometimes 200 rows or less depends on the number of columns). When you do Close & Apply in the Query editor, then all the transformations will be applied on the entire dataset. This is just for making the development pace of data transformation in the Power Query editor faster.
      Cheers
      Reza

  • Hi Reza, having filtered your records, what if I want to do a calculation on the filtered records, and then un-filter the table again (retaining the calculation for the previously filtered records)?

    • Hi James
      Not sure if I understand your question correctly.
      but if you mean this:
      filtering records, then adding a column with a calculation as expression, then wanting to apply the calculation on unfiltered too?
      if that is the question, then you need to remove the Filter Rows step right before the Add custom column, and this will apply on everything
      Cheers
      Reza

Leave a Reply