Power BI reports can get slow if you have huge amount of data. One of the ways to speed up the performance is to get the smaller part of the data, only the part that is needed for analysis. Most of the times, a date condition is a good filter for that. You might only want to analyze the last three years of data, or the last 6 quarters, or everything after a specific date. In Power BI this is simply possible using Power Query and filtering on the date field. In this short article, I’m going to explain how to do that. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.
Date Based Criteria
The date is one of the most important factors for filtering your data. Most of the time, you want to analyze only part of the data, and that part of the data has something to do with the dates. For example; you might have 20 years of transactional data stored in the database, but you only want to do an analysis of the last three years. Or because the data has been migrated from an old system, the better quality data you’ve got is after a specific date, and you want only to do analysis for that part of the data. In my experience, a date based filtering is one of the most common scenarios happening in an analysis system.
Power Query: Transforming Data Before Load
When you use Power BI (at least these days), the default option after selecting your tables is Load. I never use that option! The reason is that you are getting the entire table without any filtering or transformation. What I always do is to use Transform Data (It was called Edit previously).
If you don’t use this option, and just load data directly, then the entire data rows in that table (it might be 20 years of data or 50 million rows), will be loaded into the memory. Which is most probably what you do NOT want to happen.
If you click on Transform Data, then you get into the Power Query Editor window, where you can filter the data table BEFORE loading it into the memory. But here is an important tip.
When you see the data rows in Power Query Editor, you might think that the data is already loaded into the memory, and you want the filtering to happen beforehand. However, remember that Power Query Editor is working on a PREVIEW of data (it can be 1000 rows, or 200 rows, based on the number of columns, but it will be always preview), and not the entire dataset. After doing all the transformation in Power Query Editor, when you click on Close & Apply, at that moment, the entire REMAINING data after filtering will be loaded into Power BI.
Date Based Filtering in Power Query
To filter a column in Power Query, very similar to Excel, you can click on the column header and see all the filtering options
There are many options for filtering based on a date and time field;
In the Previous Periods (Years, Quarters, Month, Days) Filtering
One of the most common ways of filtering the date field is based on the previous periods, you can choose In the Previous
Then you can specify the period and how many occurrences of that period you want as below;
This would create logic in M script that always compares the date in the field of that table with the date and time of refreshing the Power BI dataset. It means that, after publishing this Power BI file to the service and scheduling it to refresh, it will be always the last three years of data from the time of refreshing the report.
After, Before, or Between
Another useful filtering is to get data rows only after a specific date and time, or before, or even between a start and end date. This is particularly useful when you want to filter out part of the data table because the quality of data was not good up until that date and time, or something happened in the business at a certain date (or period) and you only care about analyzing data after that or within that period.
Then you can choose dates in the Filter Rows section.
On important thing to know is that you can also use Power Query parameters to parametrize the values you use for filtering.
There are many other ways of filtering based on a date field, which you can explore
I second that again that any filtering you apply in the Power Query Editor, will be applied on the dataset BEFORE loading it to the data model, and as a result, it will reduce the memory consumption, and speed up the overall performance of your model.
Not Just for Power BI
Remember, this is not just to be used in Power BI. It can be used anywhere that you have Power Query access, which means:
- Analysis Services
- Excel
- Power BI Dataflows
- Power Query Source for SSIS (and ADF)
- …
Summary
There are many reasons why a report is slow. I have explained some of them in other articles, which you can find their links down below. Loading the entire dataset without filtering is one of them. In this article, you learned how easy it is to filter your data rows based on a date filter. Is there any other ways that you want to filter your dataset but you don’t know how to do it? ask me in the comments below.
Other resources to read if you want to speed up your report:
How to get/append only the new rows after last refresh instead of get data in some few periods or whole table?
Hi Mendes
What you are asking about is possible through Incremental Refresh.
Cheers
Reza
Salam Reza, How can I filter a column formatted as date to the latest two dates? MRC
Hi Mawson
You can filter the date field for the PREVIOUS, and choose DAY as the period, and enter 2 as the number
Hi
Is there a way to select data for last rolling 3 years period by current date?
Thanks
Ramani
Hi Ramani.
That is exactly what this filtering option will do. It will be always three years back from the date of a refresh. so will be rolling all the time.
Cheers
Reza
Thank you Reza
most excellent for me.
I have a dataset with 1.5mil rows and a set of 20 columns.
as expected it takes a while to load the data (a couple 100records a second) but when I would filter on a date field in the query it takes even longer! why is that and how to workaround?
Hi Johan
There can be many reasons that cause the load process to be slow, here are some: complex data transformations, the volume of data (rows and columns), the data source itself (excel file, database, etc)…
You mentioned that it becomes even slower after putting a filter. I believe the data source is something that doesn’t support query folding then. probably it accesses the entire data first, and then filter it. If it was a database system or something that would have supported query folding, it would have even become faster. Also, other transformations in Power Query matters too. they might cause stopping the folding to happen.
Cheers
Reza
Hello Reza,
Nice thing to know Radcad. I am using a table which has 1 date column and we are having some 15 years of transactional Data. How can i filter the data before load based on Fiscal Years, lets say from FY start date till today. We maintain Our Fiscal Year start and end date in a seperate table.
Hi Ali
everything that I mentioned in this post will be filtered BEFORE loading data into the model.
if you want to do this filtering based your fiscal year, you can go two years back, and then find the start of the fiscal year for that period (based on whatever logic calculated), and then filter everything after that. It would need a custom column to be added in Power Query first, and then filtering based on that. Here I explained a bit about how you can have the fiscal calculation
Cheers
Reza
Hi Reza,
I have a PBI-Desktop report connected to Azure Cosmos DB, using the URL and Primary key. The data stored in my Cosmos DB database tho, has changed over time and some fields have been added, but because of the ‘1000 rows limit’ of query editor while transforming the data that is being imported, PBI does not see the columns corresponding to the newer fields. On the other hand, because the DB is NoSQL, I can not use the ‘SQL statement’ option at the time of importing, to force those new columns, or filter the data to be imported from after the time the fields were added.
The DB is of the production environment, and I can’t really do anything there.
I’m stuck! Do you have any suggestions?
Hi.
You can try some approaches to pass the preview caching rows of Power Query. Have you tried Table.Buffer() function for example?
Cheers
Reza
Thanks! This helps me
This was really helpful, thank you! Is there a way to use this feature to read in only the most recent data from each month? For example, I have a data set for each week of the year, but I only want to read in the most recent data for each month? Is there a way to filter to “Is Latest” for every month?
Hi Sara
It is certainly possible. However, you do need to apply a bit more transformations.
for example, first, you need to fetch the Month and Year from the date field as a custom column. then group by data by that new column, and then sort it descending and get the first item of each group.
Cheers
Reza
I have been using this method to load data from SQL Server databases. But when I am in Query editor mode I cannot go to data view. I can see only Model and Report icons no Data icon. Is there a way I can add data view of my filtered data?
Hi Saliya
the fact that you can only see the Report and Model tabs means two things:
1- you are using DirectQuery to your SQL Server database backend
2- you are in the Power BI Desktop environment, not in the Power Query Editor
Having a DirectQuery connection means that some of the functionalities I explained here might not work for you, because Power Query transformations are limited when it comes to the DirectQuery mode. you can test it by clicking on “Transform Data” which will take you to the Power Query Editor window, make the change and see if it accepts it when you say close and apply
Cheers
Reza
Hi Reza,
I tired to get a SQL table into my Power BI dashboard and using your recommendation I clicked on “Transform Data” and after doing all the filtering and removing duplication and applying all changes in Power Query Editor, I clicked on Close & Apply. My concern is that it remains spinning to load about 46 millions of row which is the size of the table in the original format before applying all the steps that I did in the power query and at the end when I go to the Data view in the Power BI Desktop my table has only 600,000 rows.
How can I fix this? Does it really load the 46 millions of row into memory then it applies all other filtration even though I did a “Transform Data”?
Please advise.
Thank you!
Maryam
Hi Maryam
It depends on the source, and the transformations. Based on what you are saying, I guess the query folding process doesn’t happen. If you have that many rows, it is better to make sure the Query folding happens. One way to check it is to run SQL Profiler and track the query sent to the data source, if that query is the unfiltered data, then query folding doesn’t happen. In that case, it is better that you do your filtering/transformation in the SQL database itself using SQL queries and then load it into Power BI.
Cheers
Reza