Filtering in Power Query component of Power BI is easy, however it can be misleading very easily as well. I have seen the usage of this filtering inappropriately in many cases. Many people simply believe in what they see, rather than seeing behind the scenes. Power Query filtering is totally different when you do basic filtering or advanced filtering, and the result of filtering will be different too. In this post I’ll show you through a very simple example how misleading it can be and what it the correct way to do filtering in Power Query. This post is a must read for everyone who use filtering in Power Query. If you like to learn more about Power BI, read the Power BI online book from Rookie to Rock Star.
Prerequisite
For this example I’ll use AdventureWorksDW SQL Server database example. only one table which is DimCustomer.
Filtering in Power Query
Filtering rows is one of the most basic requirements in a data preparation tool such as Power Query, and Power Query do it strongly with an Excel like behavior for filtering. To see what it looks like open Power BI Desktop (or Excel), and Get Data from the database and table above; DimCustomer, Click on Edit to open Query Editor window.
In the Query Editor window there is a button on top right hand side of every column that gives you access to filtering rows
You can simply filter values in the search box provided.
Basic Filtering
For basic filtering all you need to do is typing in the text, number, or date you want in the search box and find values to select from the list. For example if you want to filter all data rows to be only for the FirstName of David, you can simply type it and select and click on OK.
Result then will be only records with David as first name.
Exactly as you expect. Now let’s see when it is misleading.
Misleading Behavior of Basic Filtering
If you want to do equity filtering, basic filtering is great. For example you only want to pick David, and that’s what we’ve done in above example. However what if you want to do similarity filtering? For example, let’s say you are interested in FirstNames that has three characters of “Dan” in it. You can do it in the same search box of basic filtering, and this will give you list of all first names that has “Dan” in it.
And then you can click on OK. the result will be showing you all first names with “Dan” characters in it. So everything looks exactly as expected, right?
However as a Power Query geek I always look at the M script of each step to see what is happening behind the scene. For this case, here is the Power Query M script;
The script tells the whole story. Despite the fact that you typed in “Dan” and Power Query showed you all FirstNames that has “Dan” in it. the script still use equity filters for every individual FirstName. For this data set there won’t be any issue obviously, because all FirstNames with “Dan” is already selected. However if new data rows coming in to this table in the future, and they have records with FirstNames that are not one of these values, for example Dandy, it won’t be picked! As a result the filter won’t work exactly as you expect. That’s why I say this is misleading.
Advanced Filtering: Correct Filtering
Basic filtering is very handy with that search box and suggestions of items while you typing, but it is misleading easily as you have seen above, and can’t give you what you expect in many real-world scenarios. Advanced Filtering is simply what you can see on top of the Search box, which depends on the data type of column can be Text Filters, Number Filters, Date Filters and etc. If we want to do the same “Dan” filter here, I can go to Text Filtering and select Contains.
You can even choose filters such as Begins With or Ends With or many other choices. For this example I’ll select Contains, and type in Dan in the text box of Contains.
After clicking on OK, you will see same result set (like the time that you did basic filtering with “Dan”), however the M script this time is different;
You can see the big different now, This time M script used Text.Contains function. This function will pick every new values with “Dan” in the FirstName. There will be no values missing in this way. You will be amazed how many Power Query scenarios are not working correctly because of this very simple reason. There are advanced filters for all types of data types.
Summary
Basic Filtering is good only if you want to do equity filtering for values that exists in the current data set, however it won’t work correctly if you want to check ranges, or contains or things that is not an exact equity filter. Advanced Filtering is the correct way of filtering in Power Query, and there are advanced filters for all types of data types; Numbers, Text, Date…. This is very simple fact, but not considering that will bring lots of unwanted behavior and incorrect insight to your Power BI solution.
Great blog post Reza, and I would even go as far as to say that I now always used the Advanced Filter.
It gives you a lot more options, as well as the ability to use a parameter as part of the filter criteria. Which then makes it very powerful as well as dynamic in terms of filtering your data.
Hi Gilbert,
Thanks for your feedback. Same on my side, I don’t use Basic Filtering anymore 🙂
Thanks Reza,
For such a nice article and just got the idea of basic filter misleading otherwise would be trapped for a long.
Thanks Asif for your kind words 🙂
Reza, this great blog could be even better if it accounted for case sensitivity of the Text.Contains function.
Correct. Everything in Power Query is case sensitive. you need to use Lower or Upper functions to do comparison in most of the cases.
Cheers
Reza
Reza,
I actually had Comparer.OrdinalIgnoreCase in mind, but Text.Contains(Text.Lower([FirstName]), “dan”) is also a viable option here.
Anyway, the point is that a straightforward Text.Contains against “dan” is not equivalent to the original Basic Filter even for the original dataset, missing ‘Dana’, ‘Danny’, etc.
Missed a bracket in the previous comment attempt (
Exactly. only Text.Contains needs the proper case to work with. usually Text.Contains works with Text.Lower or Text.Upper to get consistent result set.
Thanks for your comment 🙂
Cheers
Reza
Hi Sir
I would like to filer a table based on a column which contains numbers from 0 to 100 or so. I want leave the rows between number 13 to 24 or any other range. What is the M-Code for this
Thanks in advance.
Hi Steve
You can use Number filtering, and then choose between
Cheers
Reza
Thank you sir for your prompt reply,
May be I have poorly worded my problem. The rows with those numbers [13, 14 15, …] should not appear in the resultant table. [I can uncheck those numbers but I have to repeat for every new problem.
Thank you in adavnce
Hi Steve,
That is what I mentioned actually. You can use Filter by not selecting numbers, by using “Number Filter” right under “Remove Empty” in your menu options of that column, then choose “is greater than or equal to”. in the editor that comes up, you can put “is less than or equal to” 12, OR “is greater than or equal to” 16 or whatever value you want. This will write the M code for you. and you can then copy it from advanced editor window or Formula bar. it will be similar to this code:
Table.SelectRows(#”Changed Type”, each [ProductKey] <= 12 or [ProductKey] >= 16)
Cheers
Reza
Hi Reza,
Great article! I definitely use the advanced filter options. I have noted that at times that when I have a list of more than 1000 rows it does not do the filtering on my entire list…
= Table.SelectRows(#”Filtered Rows”, each not Text.Contains([Name], “service”) or not Text.Contains([Name], “taken”))
I have checked this by referencing a single subset and refreshing the data yet it did not pick it up the original filtering… I deleted the filter and re-inserted it and thus worked.
Any suggestions or ideas as to how to ensure this for the entire data set?
Hi Sylvian.
The nature of Power Query Editor is to only show you PREVIEW of the data, which usually is 1000 rows (sometimes 200 rows or less depends on the number of columns). When you do Close & Apply in the Query editor, then all the transformations will be applied on the entire dataset. This is just for making the development pace of data transformation in the Power Query editor faster.
Cheers
Reza
Hi Reza, having filtered your records, what if I want to do a calculation on the filtered records, and then un-filter the table again (retaining the calculation for the previously filtered records)?
Hi James
Not sure if I understand your question correctly.
but if you mean this:
filtering records, then adding a column with a calculation as expression, then wanting to apply the calculation on unfiltered too?
if that is the question, then you need to remove the Filter Rows step right before the Add custom column, and this will apply on everything
Cheers
Reza