It is common that you face a status table in your data analysis scenarios. A status table normally has a column for the status of the items and the date of status change. This type of table, although informative, can be hard for analyzing data. In this article and video, I’ll explain why it is better to change the structure of this table to a From-To table instead.
Video
Status log table
A status log table normally has a column per status change per item. An example of such table is the employment status changes table as below;
The table above tells me at what date, what employee has what role in the company based on the status change. For example, this is the information I can read from the table above:
- Leila hired as a consultant at 1st of April 2018
- Leila left on the 5th of July 2020
- Leila is hired again as a senior consultant on 1st of Feb 2021
I have explained in another article, how we can analyze this data and achieve the latest role of each employee for a given date. Read more about it here;
The method mentioned in that article, however, doesn’t take into account that the employee might leave the company. I have to change my DAX expressions in order to find that out.
Finding the latest status requires a scan of the table
One of the problems with the data structure above is that whenever we want to find the latest status up to a given date, we have to find the latest status right before that date.
This means a scan of the table, finding all the status dates before the given date,
and then getting the maximum (latest) date from that list. This is scanning the table.
Calculating duration also requires double scanning the table
Once you found the latest status, if you are interested to calculate the duration of days between that status and the status before it, then you have to re-scan the table for the latest previous status again and then calculate the difference between the two dates.
Each of the operations above can be much faster if the structure of the table is different. I call it a From-To table.
From-To Table
A from-to table will have two date columns, one to serve as the start date or FROM DATE, and another to serve as the end date or TO DATE. The status and other columns will be also there. The from-to version of the table above will look like below;
The structure above is telling us the same information overall. but reading one line of the table gives us more information than the status table. By reading one line of the table, you can understand Leila has been a consultant from the 1st of April 2018 until the 5th of July 2020.
If the status is the most up-to-date status, then the ToDate column is null. Some practices use a very future ToDate in cases like that, for example, they say 31/12/9999. Although that gives you the same meaning as using null and tells you that this is the most up-to-date record, I do not recommend this approach. Because if you are using a default date table in Power BI, This will cause a huge date table and a performance problem that I explained here. I suggest leaving these values null.
The latest status can be find faster and simpler
The table above will be much more effective and efficient to calculate the most recent status. All you need is to find the record that the given date falls between the FromDate and ToDate columns. This is simple where statement. No double-scanning of the table is required.
Duration calculation is simple, no need for scanning the table
Also, if you want the duration of days between FromDate and ToDate, you can easily calculate it using a custom column in Power Query or a calculated column in Power BI, without needing to scan the table again for the previous record.
Although, the from-to table is easier to analyze than the status log table. However, you still need to perform some calculations to achieve things such as active employees for a given period of time, etc. If you are interested to learn how to analyze data of a from-to table, check out my article about subscription patter here.
Status log or From-To table
The few reasons mentioned in this article are only some of the reasons why the from-to table works better for data analysis than the status log table. I strongly recommend changing your status log table to a from-to table whenever possible.
Changing the structure of a status log table to a from-to table can be done using Power Query transformations. I will explain it in another article and video later. This article here focused more on the modeling concept of WHY we need such a table, the next article explains HOW to build that table.
If you are interested to learn more about how to build a data model and tables in the right structure, I strongly suggest reading my book on Power BI Modeling here.
Face to this kind, I used a slowly changing dimension aproach; since table is virtual, calculations were very fast.