Dates Between Merge Join in Power Query

2017-08-15_19h19_13

Using Merge in Power Query gives you the ability to join on a EQUAL join with one or more fields between two tables. However, in some situations you need to do the Merge Join not based on equality of values, based on other comparison options. One of the very common use cases is to Merge Join two queries based on dates between. In this example I am going to show you how to use Merge Join to merge based on dates between. If you want to learn more about joining tables in Power Query read this blog post. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Download Sample Data Set

Download the data set and sample from here:

    Enter Your Email to download the file (required)

    Problem Definition

    There are some situations that you need to join two tables based on dates between not exact match of two dates. For example; consider scenario below:

    There are two tables; Sales table includes sales transactions by Customer, Product, and Date. and Customer table has the detailed information about customer including ID, Name, and City. Here is a screenshot of Sales Table:

    2017-08-15_17h58_14

    Customer’s table has the history details of changes through the time. For example, the customer ID 2, has a track of change. John was living in Sydney for a period of time, then moved to Melbourne after that.

    2017-08-15_18h01_12

    The problem we are trying to solve is to join these two tables based on their customer ID, and find out the City related to that for that specific period of time. We have to check the Date field from Sales Table to fit into FromDate and ToDate of the Customer table.

    Grain Matching

    One of the easiest ways of matching two tables is to bring them both to the same grain. In this example Sales Table is at the grain of Customer, Product, and Date. However, the Customer table is at the grain of Customer and a change in properties such as City. We can change the grain of customer table to be on Customer and Date. That means Having one record per every customer and every day.

    Before applying this change, there is a little warning I would like to explain; with changing grain of a table to more detailed grain, number of rows for that table will increase significantly. It is fine to do it as an intermediate change, but if you want to make this change as final query to be loaded in Power BI, then you need to think about your approach more carefully.

    Step 1: Calculating Duration

    The first step in this approach is to find out how many days is the duration between FromDate and ToDate in the customer table for each row. That simply can be calculated with selecting two columns (First ToDate, then FromDate), then From Add Column Tab, under Date, Subtract Days.

    2017-08-15_18h31_41

    Then you will see the new column added which is the duration between From and To dates

    2017-08-15_18h34_45

    Step 2: Creating List of Dates

    Second step is to create a list of dates for every record, starting from FromDate, adding one day at a time, for the number of occurrence in DateDifference column.

    There is a generator that you can easily use to create a list of dates. List.Dates is a Power Query function which will generate list of dates. Here is the syntax for this table;

    List.Dates(<start date>,<occurrence>,<duration>)
    • start date in this scenario will come from FromDate column
    • Occurrence would come from DateDifference plus one.
    • Duration should be in a day Level. Duration has 4 input arguments:
    #duration(<day>,<hour>,<minute>,<second>)

    a daily duration would be: #duration(1,0,0,0)

    So, we need to add a custom column to our table;

    2017-08-15_18h47_51

    The custom column expression can be as below;

    List.Dates([FromDate],[DateDifference]+1,#duration(1,0,0,0))

    I named this column as Dates.

    Here is the result:

    2017-08-15_18h58_52

    The Dates column now have a list in every row. this list is a list of dates. next step is to expand it.

    Step 3: Expand List to Day Level

    Last step to change the grain of this table, is to expand the Dates column. To expand, just click on Expand button.

    2017-08-15_18h55_27

    Expanding to new rows will give you a data set with all dates;

    2017-08-15_19h00_08

    Now you can remove FromDate, ToDate, and DateDifference. We don’t need these three columns anymore.

    2017-08-15_19h02_36

    Table above is the same customer table but on different grain. we can now easily see on which dates John was in Sydney, and which dates in Melbourne. This table now can be easily merged with the sales table.

    Merging Tables on the Same Grain

    When both tables are at the same grain, then you can easily merge them together.

    2017-08-15_19h05_18

    Merge should be between two tables, based on CustomerID and Dates. You need to hold Ctrl key to select more than one column. and make sure you select them in the same order in both tables. After merge then you can expand and only select City and Name from the other table;

    2017-08-15_19h08_20

    The final result shows that two sales transactions for John happened at two different times that John has been in two different cities of Sydney and Melbourne.

    2017-08-15_19h10_34

    Final Step: Cleansing

    You won’t need first two tables after merging them together, you can disable their load to avoid extra memory consumption (especially for Customer table which should be big after grain change). To learn more about Enable Load and solving performance issues, read this blog post.

    2017-08-15_19h13_41

    Summary

    There are multiple ways of joining two tables based on non-equality comparison. Matching grain is one of them and works perfectly fine, and easy to implement. In this post you’ve learned how to use grain matching to do this joining and get the join result based on dates between comparison. with this method, be careful to disable the load of the table that you’ve changed the grain for it to avoid performance issues afterwards.

    Download Sample Data Set

    Download the data set and sample from here:

      Enter Your Email to download the file (required)

      Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
      Reza Rad
      Trainer, Consultant, Mentor
      Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
      Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
      He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
      Reza’s passion is to help you find the best data solution, he is Data enthusiast.
      His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

      20 thoughts on “Dates Between Merge Join in Power Query

      • An alternative for creating a list of dates between start and end date: transform the dates to numbers, next create the list {[startdate]..[enddate]}, expand and transform back to dates.
        Not sure if this is more or less efficient in execution, but it is quite simple in coding: no need to have the duration calculated first,

        • Hi Marcel,
          That would work as well. but it is the same type of approach (creating list, and expanding = grain matching). in terms of the execution it should be the same, because duration data type is also a numeric data type behind the scene.
          Cheers
          Reza

      • Nice post. A lot simpler than a complicated DAX query approach. FYI, I failed to download the sample file. I entered my email, clicked button, watched it spin, then it said a link was below, but I didn’t see anything. Using FireFox, if that helps.

      • Reza:

        Can you give an example where the table requiring expansion is from a view which is over a system-versioned table? Here there are potentially multiple values on a given day and we’re only interested in the last value for a given day.

        Thank you for your consideration.

        Respectfully,

        Darryll Petrancuri

        • Hi Darryll,
          there are two ways to do this;
          – Power Query Way: You get the entire dataset in Power Query, then do ordering based on the date, then you can do Group By and only pick the first item (or even you can use Remove duplicates). here is an example of Group By I explained:
          https://radacad.com/grouping-in-power-query-getting-the-last-item-in-each-group
          – T-SQL way; If you are using system versioned History tables which are Temporal tables SQL Server 2016+, then you can write your query in a way that gives you only the current record

          Cheers
          Reza

      • REZA, you are a G-D send !!! I can’t thank you enough. I have read countless blogs on a variety of MS BI topics, and with yours I sense the spirit of giving permeate through and through, making the quality of your contribution exquisite.
        Bless you 🙂
        Isaac Mashiah
        ISRAEL

      • Amazing! Great work! I have a question about a special case inside that one: what if the “date” fall before the “fromDate” and in that case I would like to return the closest “fromDate”. Example: if date in ID 2 was 31/12/2016 and I would like to return “sydney”.

        • To advise you exactly, I need to see the dataset, but as general advice, I can say after the merge, you can use duration to check the duration between that date and other dates in the underlying table and then pick the one with Min duration.

          Cheers
          Reza

      • Hi Reza,
        Thanks you for all your nice posts!!!
        Do you know perhaps what the limit (in terms of performance is) of your solution?
        Kind regards,
        Catherine

        • Hi Catherine
          this method users Merge operation in Power Query, which depends on how you use it, and the size of data can be slow. apart from that, no limitations.
          Cheers
          Reza

      • Hi Reza, thanks for sharing! I have exactly the same problem, but it is not date, but time. So the time in the 1st table (i.e. i.e. 11/08/2021 8:54:56 am) need to match the time period in the 2nd table (i.e. start time 10/08/2021 8:50:12 am, end time 20/08/2021 9:46:35pm). Could you please share the solution? Thanks!

        • You can use List.DateTimes function to do that, it works very similar;
          List.DateTimes(#datetime(2011, 12, 31, 23, 55, 0), 10, #duration(0, 0, 1, 0))
          but the performance might be very slow, because there will be many time points. writing a custom function to check if that value is in between might work faster in that case
          Cheers
          Reza

      Leave a Reply