Using Merge in Power Query gives you the ability to join on a EQUAL join with one or more fields between two tables. However, in some situations you need to do the Merge Join not based on equality of values, based on other comparison options. One of the very common use cases is to Merge Join two queries based on dates between. In this example I am going to show you how to use Merge Join to merge based on dates between. If you want to learn more about joining tables in Power Query read this blog post. To learn more about Power BI, read Power BI book from Rookie to Rock Star.
Download Sample Data Set
Download the data set and sample from here:
Problem Definition
There are some situations that you need to join two tables based on dates between not exact match of two dates. For example; consider scenario below:
There are two tables; Sales table includes sales transactions by Customer, Product, and Date. and Customer table has the detailed information about customer including ID, Name, and City. Here is a screenshot of Sales Table:
Customer’s table has the history details of changes through the time. For example, the customer ID 2, has a track of change. John was living in Sydney for a period of time, then moved to Melbourne after that.
The problem we are trying to solve is to join these two tables based on their customer ID, and find out the City related to that for that specific period of time. We have to check the Date field from Sales Table to fit into FromDate and ToDate of the Customer table.
Grain Matching
One of the easiest ways of matching two tables is to bring them both to the same grain. In this example Sales Table is at the grain of Customer, Product, and Date. However, the Customer table is at the grain of Customer and a change in properties such as City. We can change the grain of customer table to be on Customer and Date. That means Having one record per every customer and every day.
Before applying this change, there is a little warning I would like to explain; with changing grain of a table to more detailed grain, number of rows for that table will increase significantly. It is fine to do it as an intermediate change, but if you want to make this change as final query to be loaded in Power BI, then you need to think about your approach more carefully.
Step 1: Calculating Duration
The first step in this approach is to find out how many days is the duration between FromDate and ToDate in the customer table for each row. That simply can be calculated with selecting two columns (First ToDate, then FromDate), then From Add Column Tab, under Date, Subtract Days.
Then you will see the new column added which is the duration between From and To dates
Step 2: Creating List of Dates
Second step is to create a list of dates for every record, starting from FromDate, adding one day at a time, for the number of occurrence in DateDifference column.
There is a generator that you can easily use to create a list of dates. List.Dates is a Power Query function which will generate list of dates. Here is the syntax for this table;
List.Dates(<start date>,<occurrence>,<duration>)
- start date in this scenario will come from FromDate column
- Occurrence would come from DateDifference plus one.
- Duration should be in a day Level. Duration has 4 input arguments:
#duration(<day>,<hour>,<minute>,<second>)
a daily duration would be: #duration(1,0,0,0)
So, we need to add a custom column to our table;
The custom column expression can be as below;
List.Dates([FromDate],[DateDifference]+1,#duration(1,0,0,0))
I named this column as Dates.
Here is the result:
The Dates column now have a list in every row. this list is a list of dates. next step is to expand it.
Step 3: Expand List to Day Level
Last step to change the grain of this table, is to expand the Dates column. To expand, just click on Expand button.
Expanding to new rows will give you a data set with all dates;
Now you can remove FromDate, ToDate, and DateDifference. We don’t need these three columns anymore.
Table above is the same customer table but on different grain. we can now easily see on which dates John was in Sydney, and which dates in Melbourne. This table now can be easily merged with the sales table.
Merging Tables on the Same Grain
When both tables are at the same grain, then you can easily merge them together.
Merge should be between two tables, based on CustomerID and Dates. You need to hold Ctrl key to select more than one column. and make sure you select them in the same order in both tables. After merge then you can expand and only select City and Name from the other table;
The final result shows that two sales transactions for John happened at two different times that John has been in two different cities of Sydney and Melbourne.
Final Step: Cleansing
You won’t need first two tables after merging them together, you can disable their load to avoid extra memory consumption (especially for Customer table which should be big after grain change). To learn more about Enable Load and solving performance issues, read this blog post.
Summary
There are multiple ways of joining two tables based on non-equality comparison. Matching grain is one of them and works perfectly fine, and easy to implement. In this post you’ve learned how to use grain matching to do this joining and get the join result based on dates between comparison. with this method, be careful to disable the load of the table that you’ve changed the grain for it to avoid performance issues afterwards.
Download Sample Data Set
Download the data set and sample from here:
Wow, great tutorial! This is very helpfull for me. Many Thanks!
An alternative for creating a list of dates between start and end date: transform the dates to numbers, next create the list {[startdate]..[enddate]}, expand and transform back to dates.
Not sure if this is more or less efficient in execution, but it is quite simple in coding: no need to have the duration calculated first,
Hi Marcel,
That would work as well. but it is the same type of approach (creating list, and expanding = grain matching). in terms of the execution it should be the same, because duration data type is also a numeric data type behind the scene.
Cheers
Reza
Nice post. A lot simpler than a complicated DAX query approach. FYI, I failed to download the sample file. I entered my email, clicked button, watched it spin, then it said a link was below, but I didn’t see anything. Using FireFox, if that helps.
Works in Chrome! Go figure, the browser wars continue.
Thanks Donald,
This plugin for download still has some issues. we are working on it to be fixed
Cheers
Reza
Awesome post! Incredibly helpful.
Reza:
Can you give an example where the table requiring expansion is from a view which is over a system-versioned table? Here there are potentially multiple values on a given day and we’re only interested in the last value for a given day.
Thank you for your consideration.
Respectfully,
Darryll Petrancuri
Hi Darryll,
there are two ways to do this;
– Power Query Way: You get the entire dataset in Power Query, then do ordering based on the date, then you can do Group By and only pick the first item (or even you can use Remove duplicates). here is an example of Group By I explained:
https://radacad.com/grouping-in-power-query-getting-the-last-item-in-each-group
– T-SQL way; If you are using system versioned History tables which are Temporal tables SQL Server 2016+, then you can write your query in a way that gives you only the current record
Cheers
Reza
REZA, you are a G-D send !!! I can’t thank you enough. I have read countless blogs on a variety of MS BI topics, and with yours I sense the spirit of giving permeate through and through, making the quality of your contribution exquisite.
Bless you 🙂
Isaac Mashiah
ISRAEL
Thanks Issac for your kind words 🙂
Great solution..!! Simple and easy to understand.
Excelent Post! Loved it!
Great! Thank you!
Amazing! Great work! I have a question about a special case inside that one: what if the “date” fall before the “fromDate” and in that case I would like to return the closest “fromDate”. Example: if date in ID 2 was 31/12/2016 and I would like to return “sydney”.
To advise you exactly, I need to see the dataset, but as general advice, I can say after the merge, you can use duration to check the duration between that date and other dates in the underlying table and then pick the one with Min duration.
Cheers
Reza
Hi Reza,
Thanks you for all your nice posts!!!
Do you know perhaps what the limit (in terms of performance is) of your solution?
Kind regards,
Catherine
Hi Catherine
this method users Merge operation in Power Query, which depends on how you use it, and the size of data can be slow. apart from that, no limitations.
Cheers
Reza
Hi Reza, thanks for sharing! I have exactly the same problem, but it is not date, but time. So the time in the 1st table (i.e. i.e. 11/08/2021 8:54:56 am) need to match the time period in the 2nd table (i.e. start time 10/08/2021 8:50:12 am, end time 20/08/2021 9:46:35pm). Could you please share the solution? Thanks!
You can use List.DateTimes function to do that, it works very similar;
List.DateTimes(#datetime(2011, 12, 31, 23, 55, 0), 10, #duration(0, 0, 1, 0))
but the performance might be very slow, because there will be many time points. writing a custom function to check if that value is in between might work faster in that case
Cheers
Reza