Dates Between Merge Join in Power Query

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2017-08-15_19h19_13

Using Merge in Power Query gives you the ability to join on a EQUAL join with one or more fields between two tables. However, in some situations you need to do the Merge Join not based on equality of values, based on other comparison options. One of the very common use cases is to Merge Join two queries based on dates between. In this example I am going to show you how to use Merge Join to merge based on dates between. If you want to learn more about joining tables in Power Query read this blog post. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Download Sample Data Set

Download the data set and sample from here:


Enter Your Email to download the file (required)

Problem Definition

There are some situations that you need to join two tables based on dates between not exact match of two dates. For example; consider scenario below:

There are two tables; Sales table includes sales transactions by Customer, Product, and Date. and Customer table has the detailed information about customer including ID, Name, and City. Here is a screenshot of Sales Table:

2017-08-15_17h58_14

Customer’s table has the history details of changes through the time. For example, the customer ID 2, has a track of change. John was living in Sydney for a period of time, then moved to Melbourne after that.

2017-08-15_18h01_12

The problem we are trying to solve is to join these two tables based on their customer ID, and find out the City related to that for that specific period of time. We have to check the Date field from Sales Table to fit into FromDate and ToDate of the Customer table.

Grain Matching

One of the easiest ways of matching two tables is to bring them both to the same grain. In this example Sales Table is at the grain of Customer, Product, and Date. However, the Customer table is at the grain of Customer and a change in properties such as City. We can change the grain of customer table to be on Customer and Date. That means Having one record per every customer and every day.

Before applying this change, there is a little warning I would like to explain; with changing grain of a table to more detailed grain, number of rows for that table will increase significantly. It is fine to do it as an intermediate change, but if you want to make this change as final query to be loaded in Power BI, then you need to think about your approach more carefully.

Step 1: Calculating Duration

The first step in this approach is to find out how many days is the duration between FromDate and ToDate in the customer table for each row. That simply can be calculated with selecting two columns (First ToDate, then FromDate), then From Add Column Tab, under Date, Subtract Days.

2017-08-15_18h31_41

Then you will see the new column added which is the duration between From and To dates

2017-08-15_18h34_45

Step 2: Creating List of Dates

Second step is to create a list of dates for every record, starting from FromDate, adding one day at a time, for the number of occurrence in DateDifference column.

There is a generator that you can easily use to create a list of dates. List.Dates is a Power Query function which will generate list of dates. Here is the syntax for this table;

  • start date in this scenario will come from FromDate column
  • Occurrence would come from DateDifference plus one.
  • Duration should be in a day Level. Duration has 4 input arguments:

a daily duration would be: #duration(1,0,0,0)

So, we need to add a custom column to our table;

2017-08-15_18h47_51

The custom column expression can be as below;

I named this column as Dates.

Here is the result:

2017-08-15_18h58_52

The Dates column now have a list in every row. this list is a list of dates. next step is to expand it.

Step 3: Expand List to Day Level

Last step to change the grain of this table, is to expand the Dates column. To expand, just click on Expand button.

2017-08-15_18h55_27

Expanding to new rows will give you a data set with all dates;

2017-08-15_19h00_08

Now you can remove FromDate, ToDate, and DateDifference. We don’t need these three columns anymore.

2017-08-15_19h02_36

Table above is the same customer table but on different grain. we can now easily see on which dates John was in Sydney, and which dates in Melbourne. This table now can be easily merged with the sales table.

Merging Tables on the Same Grain

When both tables are at the same grain, then you can easily merge them together.

2017-08-15_19h05_18

Merge should be between two tables, based on CustomerID and Dates. You need to hold Ctrl key to select more than one column. and make sure you select them in the same order in both tables. After merge then you can expand and only select City and Name from the other table;

2017-08-15_19h08_20

The final result shows that two sales transactions for John happened at two different times that John has been in two different cities of Sydney and Melbourne.

2017-08-15_19h10_34

Final Step: Cleansing

You won’t need first two tables after merging them together, you can disable their load to avoid extra memory consumption (especially for Customer table which should be big after grain change). To learn more about Enable Load and solving performance issues, read this blog post.

2017-08-15_19h13_41

Summary

There are multiple ways of joining two tables based on non-equality comparison. Matching grain is one of them and works perfectly fine, and easy to implement. In this post you’ve learned how to use grain matching to do this joining and get the join result based on dates between comparison. with this method, be careful to disable the load of the table that you’ve changed the grain for it to avoid performance issues afterwards.

Download Sample Data Set

Download the data set and sample from here:


Enter Your Email to download the file (required)

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

6 Comments

  • An alternative for creating a list of dates between start and end date: transform the dates to numbers, next create the list {[startdate]..[enddate]}, expand and transform back to dates.
    Not sure if this is more or less efficient in execution, but it is quite simple in coding: no need to have the duration calculated first,

    • Hi Marcel,
      That would work as well. but it is the same type of approach (creating list, and expanding = grain matching). in terms of the execution it should be the same, because duration data type is also a numeric data type behind the scene.
      Cheers
      Reza

  • Nice post. A lot simpler than a complicated DAX query approach. FYI, I failed to download the sample file. I entered my email, clicked button, watched it spin, then it said a link was below, but I didn’t see anything. Using FireFox, if that helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">