Performance Design Tip for Power BI: Be Careful of Dates

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

I have written about Date Dimension in Power BI in many posts, and it makes sense now to explain one of the most common performance challenges that I see in some of the models. Sometimes, your model size grows significantly big, and you have no idea why!? In this article, I will show you one of the performance issues which might cause because of specific date values, and how to fix it. I have run into this many times in my consulting gigs with clients, However, a friend of mine Rui Romano’s (B|T) great presentation made me think that there are still many people who are not aware of this, and need guidance on it, so time to write about it then. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.

Sample Dataset

I have a sample dataset, with one table, and three rows! as below;

This is not a big table at all as you can see. However, when I save the *.pbix file, the size of the file seems a bit crazy!

Yes, you saw it right. You can try it yourself, and you would probably get the same experience. Now let’s see why this happens and how to fix it.

Why? Investigating Using Power BI Helper

If you have a large *.pbix file, you can investigate what are the columns and tables that causing the highest storage consumption, using Power BI Helper. You can download Power BI Helper for free from here. I opened the file above in Power BI Helper, and in the Modeling Advise tab, this is what I see:

As you can see in the above output, the Date field in the Date table is the biggest column in this dataset. taking 150MB runtime memory! This is considering that we have only three distinct values in the column! Seems a bit strange, isn’t? let’s dig into the reason more in deep.

Why? Default Date Dimension

I have previously written about the default date dimension in Power BI and the difference of that with a custom date dimension. For understanding the current behavior mentioned in this article, you need to know what is the default date dimension and how it works. I go through it very quickly;

Power BI creates a default date dimension for every single date field in your dataset. Values in that date table would be from the minimum date value in that field (column) to the maximum date value, with one row for each day. There is an option to disable the default date dimension if you want and create your own date dimension. However, having the default date dimension will make many of date calculations simpler and easier, and that is why many people are using it that way. Well, this was a very quick wrap of the article below (which you can read to get a more in-depth understanding of it);

There are a few parts of the previous paragraph that you need to read more carefully:

Power BI creates a default date dimension for every single date field in your dataset.

This now means that the Date field in our sample table, is not just a field, but it is a table. Which we call it the default date dimension table.

Values in that date table would be from the minimum date value in that field (column) to the maximum date value, with one row for each day.

The minimum value in that column is 1st of Jan 1990, and the maximum value is 31st of Dec 9999! Wow, the maximum value changes everything isn’t? This means that in the date dimension we will have all days (one day per row) from 1st of Jan 1990 to 31st of Dec 9999. That is about 8,009 years, which makes almost 3 Million rows in the default date table. The date column of that table, will have 3 million distinct values, and that is why we have such a big running memory for this column;

Now that you know why the file size is so big, let’s see how we can fix it with a simple solution.

Solution: Not Applicable Date Value

The 31st of Dec of 9999 makes no sense when it comes to date calculations. This value is probably coming from an operational system which puts the default value of the dates (if not entered, or to show a far date which is inaccessible) using a date which is thousands of years in the future.

If you are using the default date dimension, I do recommend not to use this approach and replace any value like that with null. You can do that in Power Query with a simple Replace Value option. Here is how to do it:

Open Edit Queries, to go to Power Query Editor window;

Then right click on the column, and do the Replace Values

Replace the 31st of Dec 9999 with null;

Then click on Close and Apply in the Power Query Editor window. Our data table now looks like below (null values will be shown as blank in Power BI)

And now, if we save the *.pbix file again. this is a new size:

The file size shrank from 26MB to 66KB! That is almost 400 times smaller! All of that with such simple change. And now all columns are small in the in-memory storage consumption;

Summary

Thanks to Rui for giving me the idea to write about this as a blog post. As you have seen, the default date table doesn’t know that the furthest date value you have in 9999 is just a “not applicable” value. The default date table will consider that as your biggest date value and build a whole table based on that, which may cause performance issues like what you have seen in this example. Change your “not applicable” date values to null to fix it very simply. Do you have any other date value challenges that you like me to write about it? please let me know in the comment below

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

5 thoughts on “Performance Design Tip for Power BI: Be Careful of Dates

  • Isn’t the best practice is to turn off “Auto Date/Time” and create date dimension in the model, without turning it off, it will create date table for each date field in the model.

    Although it is still valid in case you don’t turn off “auto date/time”.

    • Hi Parv
      I would most of the time create a custom date dimension and use it. However, calling that a best practice, means considering a lot of scenarios.
      One of the scenarios is that many business analysts are using Power BI, and for many of them, it might be just much easier to use the default date dimension. Using approaches like what mentioned here, makes sure that even the default date dimension for them works in a good way.
      Cheers
      Reza

      • I understand that there are scenarios where user doesn’t know or doesn’t create date dimension but I think adding a note in the blog and guide users to create date dimension will help users.

        Anyhow, it is just my opinion.

        Thanks,
        P

Leave a Reply

Your email address will not be published. Required fields are marked *