Warning! Misleading Power Query Filtering

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-12-15_10h34_48

Filtering in Power Query component of Power BI is easy, however it can be misleading very easily as well. I have seen the usage of this filtering inappropriately in many cases. Many people simply believe in what they see, rather than seeing behind the scenes. Power Query filtering is totally different when you do basic filtering or advanced filtering, and the result of filtering will be different too. In this post I’ll show you through a very simple example how misleading it can be and what it the correct way to do filtering in Power Query. This post is a must read for everyone who use filtering in Power Query. If you like to learn more about Power BI, read the Power BI online book from Rookie to Rock Star.

Prerequisite

For this example I’ll use AdventureWorksDW SQL Server database example. only one table which is DimCustomer.

Filtering in Power Query

Filtering rows is one of the most basic requirements in a data preparation tool such as Power Query, and Power Query do it strongly with an Excel like behavior for filtering. To see what it looks like open Power BI Desktop (or Excel), and Get Data from the database and table above; DimCustomer, Click on Edit to open Query Editor window.

In the Query Editor window there is a button on top right hand side of every column that gives you access to filtering rows

2016-12-15_10h23_11

You can simply filter values in the search box provided.

Basic Filtering

For basic filtering all you need to do is typing in the text, number, or date you want in the search box and find values to select from the list. For example if you want to filter all data rows to be only for the FirstName of David, you can simply type it and select and click on OK.

2016-12-15_10h26_35

Result then will be only records with David as first name.

2016-12-15_10h27_15

Exactly as you expect. Now let’s see when it is misleading.

Misleading Behavior of Basic Filtering

If you want to do equity filtering, basic filtering is great. For example you only want to pick David, and that’s what we’ve done in above example. However what if you want to do similarity filtering? For example, let’s say you are interested in FirstNames that has three characters of “Dan” in it. You can do it in the same search box of basic filtering, and this will give you list of all first names that has “Dan” in it.

2016-12-15_10h32_04

And then you can click on OK. the result will be showing you all first names with “Dan” characters in it. So everything looks exactly as expected, right?

2016-12-15_10h33_18

However as a Power Query geek I always look at the M script of each step to see what is happening behind the scene. For this case, here is the Power Query M script;

2016-12-15_10h34_48

The script tells the whole story. Despite the fact that you typed in “Dan” and Power Query showed you all FirstNames that has “Dan” in it. the script still use equity filters for every individual FirstName. For this data set there won’t be any issue obviously, because all FirstNames with “Dan” is already selected. However if new data rows coming in to this table in the future, and they have records with FirstNames that are not one of these values, for example Dandy, it won’t be picked! As a result the filter won’t work exactly as you expect. That’s why I say this is misleading.

Advanced Filtering: Correct Filtering

Basic filtering is very handy with that search box and suggestions of items while you typing, but it is misleading easily as you have seen above, and can’t give you what you expect in many real-world scenarios. Advanced Filtering is simply what you can see on top of the Search box, which depends on the data type of column can be Text Filters, Number Filters, Date Filters and etc. If we want to do the same “Dan” filter here, I can go to Text Filtering and select Contains.

2016-12-15_10h48_39

You can even choose filters such as Begins With or Ends With or many other choices. For this example I’ll select Contains, and type in Dan in the text box of Contains.

2016-12-15_10h51_04

After clicking on OK, you will see same result set (like the time that you did basic filtering with “Dan”), however the M script this time is different;

2016-12-15_10h52_14

You can see the big different now, This time M script used Text.Contains function. This function will pick every new values with “Dan” in the FirstName. There will be no values missing in this way. You will be amazed how many Power Query scenarios are not working correctly because of this very simple reason. There are advanced filters for all types of data types.

Summary

Basic Filtering is good only if you want to do equity filtering for values that exists in the current data set, however it won’t work correctly if you want to check ranges, or contains or things that is not an exact equity filter. Advanced Filtering is the correct way of filtering in Power Query, and there are advanced filters for all types of data types; Numbers, Text, Date….  This is very simple fact, but not considering that will bring lots of unwanted behavior and incorrect insight to your Power BI solution.

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

8 Comments

  • Great blog post Reza, and I would even go as far as to say that I now always used the Advanced Filter.

    It gives you a lot more options, as well as the ability to use a parameter as part of the filter criteria. Which then makes it very powerful as well as dynamic in terms of filtering your data.

      • Reza,
        I actually had Comparer.OrdinalIgnoreCase in mind, but Text.Contains(Text.Lower([FirstName]), “dan”) is also a viable option here.
        Anyway, the point is that a straightforward Text.Contains against “dan” is not equivalent to the original Basic Filter even for the original dataset, missing ‘Dana’, ‘Danny’, etc.
        Missed a bracket in the previous comment attempt (

        • Exactly. only Text.Contains needs the proper case to work with. usually Text.Contains works with Text.Lower or Text.Upper to get consistent result set.
          Thanks for your comment :)
          Cheers
          Reza

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">