I have written about Date Dimension in Power BI in many posts, and it makes sense now to explain one of the most common performance challenges that I see in some of the models. Sometimes, your model size grows significantly big, and you have no idea why!? In this article, I will show you one of the performance issues which might cause because of specific date values, and how to fix it. I have run into this many times in my consulting gigs with clients, However, a friend of mine Rui Romano’s (B|T) great presentation made me think that there are still many people who are not aware of this, and need guidance on it, so time to write about it then. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.
I have a sample dataset, with one table, and three rows! as below;
This is not a big table at all as you can see. However, when I save the *.pbix file, the size of the file seems a bit crazy!
Yes, you saw it right. You can try it yourself, and you would probably get the same experience. Now let’s see why this happens and how to fix it.
Why? Investigating Using Power BI Helper
If you have a large *.pbix file, you can investigate what are the columns and tables that causing the highest storage consumption, using Power BI Helper. You can download Power BI Helper for free from here. I opened the file above in Power BI Helper, and in the Modeling Advise tab, this is what I see:
As you can see in the above output, the Date field in the Date table is the biggest column in this dataset. taking 150MB runtime memory! This is considering that we have only three distinct values in the column! Seems a bit strange, isn’t? let’s dig into the reason more in deep.
Why? Default Date Dimension
I have previously written about the default date dimension in Power BI and the difference of that with a custom date dimension. For understanding the current behavior mentioned in this article, you need to know what is the default date dimension and how it works. I go through it very quickly;
Power BI creates a default date dimension for every single date field in your dataset. Values in that date table would be from the minimum date value in that field (column) to the maximum date value, with one row for each day. There is an option to disable the default date dimension if you want and create your own date dimension. However, having the default date dimension will make many of date calculations simpler and easier, and that is why many people are using it that way. Well, this was a very quick wrap of the article below (which you can read to get a more in-depth understanding of it);
There are a few parts of the previous paragraph that you need to read more carefully:
Power BI creates a default date dimension for every single date field in your dataset.
This now means that the Date field in our sample table, is not just a field, but it is a table. Which we call it the default date dimension table.
Values in that date table would be from the minimum date value in that field (column) to the maximum date value, with one row for each day.
The minimum value in that column is 1st of Jan 1990, and the maximum value is 31st of Dec 9999! Wow, the maximum value changes everything isn’t? This means that in the date dimension we will have all days (one day per row) from 1st of Jan 1990 to 31st of Dec 9999. That is about 8,009 years, which makes almost 3 Million rows in the default date table. The date column of that table, will have 3 million distinct values, and that is why we have such a big running memory for this column;
Now that you know why the file size is so big, let’s see how we can fix it with a simple solution.
Solution: Not Applicable Date Value
The 31st of Dec of 9999 makes no sense when it comes to date calculations. This value is probably coming from an operational system which puts the default value of the dates (if not entered, or to show a far date which is inaccessible) using a date which is thousands of years in the future.
If you are using the default date dimension, I do recommend not to use this approach and replace any value like that with null. You can do that in Power Query with a simple Replace Value option. Here is how to do it:
Open Edit Queries, to go to Power Query Editor window;
Then right click on the column, and do the Replace Values
Replace the 31st of Dec 9999 with null;
Then click on Close and Apply in the Power Query Editor window. Our data table now looks like below (null values will be shown as blank in Power BI)
And now, if we save the *.pbix file again. this is a new size:
The file size shrank from 26MB to 66KB! That is almost 400 times smaller! All of that with such simple change. And now all columns are small in the in-memory storage consumption;
Thanks to Rui for giving me the idea to write about this as a blog post. As you have seen, the default date table doesn’t know that the furthest date value you have in 9999 is just a “not applicable” value. The default date table will consider that as your biggest date value and build a whole table based on that, which may cause performance issues like what you have seen in this example. Change your “not applicable” date values to null to fix it very simply. Do you have any other date value challenges that you like me to write about it? please let me know in the comment below