Lost Customers DAX Calculation for Power BI

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-11-22_09h07_06

One of the sample scenarios that DAX can be used as a calculation engine is customer retention. In the area of customer retention businesses might be interested to see who there lost customers or new customers are in the specific period. Obviously this can be calculated in the Power Query or in the data source level, however if we want to keep the period dynamic then DAX is the great asset to do it. In this post I’ll explain how you can write a simple DAX calculation to find new customers and lost customers over a dynamic period of time in Power BI. If you are interested to learn more about Power BI read the Power BI online book; from Rookie to Rock Star.

Defining Terms

Before start let’s define terms that we use here. There might be different definitions for Lost or New Customers, the definition that I use is the very simple definition that I have seen some business use to easily measure their customer retention.

New Customer(s): Any customer who purchased any product in the last specified period of time considering the fact that the customer didn’t purchased anything from this business before this period.

Lost Customer(s): Any customer who haven’t purchased any product in the last specified period of time considering the fact that he/she previously purchased from this business.

The method explained below through an example.

Prerequisite

For running this example you need to have AdventureWorksDW database or the Excel version of it.

Scenario

In the AdventureWorksDW, we want to find out who our new customers and also lost customers are. We want to have the flexibility to select the period (that determines if customer is lost or new) in the slicer. We need these tables in AdventureWorksDW; FactInternetSales, DimDate, DimCustomer, and DimProduct.

The Model

Let’s start by getting data from AdventureWorksDW SQL Server database. Tables that we need for this model are: FactInternetSales, DimDate, DimCustomer, and DimProduct. You don’t need to apply any Power Query transformation. Just load the tables in the Power BI. and check the relationship to be as below. Note that relationship between DimDate and FactInternetSales is only one active relationship based on OrderDateKey (remove other relationships between these two tables).

2016-11-22_07h25_41

Period Slicer

Also we need to have a Period Slicer. This is for being able to dynamically select the period (for example last 30 days, or last 365 days). You can create this table any where, in SQL Server db. in Power Query, in Excel…. I have created that in my model itself through Enter Data. Here is how the table looks like;

2016-11-22_07h34_22

Table Name is Period, and the column name is also Period.

Method

What I do for the calculation is to calculate the total revenue from each customer first. This is the total revenue regardless of period. Means sum of SalesAmount for everything that this customer bought for all times (I call this Total Revenue). Then I create another measure to calculate sum of SalesAmount only for the selected period (in Days), Let’s call this Last Period Revenue. Once we have these calculations finding new or lost costumers are easy as below;

Lost Customer(s): Customers that their Total Revenue is greater than zero, but their Last Period Revenue is not.

New Customer(s): Customers that their Total Revenue is equal to their Last Period Revenue, and greater than zero.

The reason for greater than zero condition is that in the database we might have some customer records that haven’t purchased anything at all (seems odd, but we have to consider it, as there might be some exceptions in some business). I do all of steps one by one in a separate measure, this is just to give you an understanding of the process. Obviously these all can be done in one measure.

To start the very first thing is to identify the selected value in the slicer through measure.

Measure for the Selected Value in the Slicer: Selected Period

First things first is to identify what item is selected in the Period slicer. DAX doesn’t understand the SELECTED item, however it understand the CONTEXT in which the DAX expression evaluates. This means; if I select one or more values in the slicer, DAX expression will execute on top of that selection. This selection is the filter context of DAX expression. That’s simple. Now for this particular slicer we want user to always select one item, so it is also good to check if multiple items is selected or not. So let’s create the measure (You can create a measure in the Data Tab, under Modeling, New Measure), named Selected Period under Period table with calculation below (Note that this should be a measure, not calculated column);

Expression above simply just check if number of records under the selected context (which will be selected slicer items) is one, then it will fetch the MAX of period. Why maximum when it is only one row? because DAX always works with a set of rows. even though we know it will be one row after passing this condition, DAX still works with a set of rows. the set of rows might have one or more rows. when it has only one row, we can use MIN or MAX or things like that to fetch it as a single value from the set of rows. And finally when multiple items are selected it will return blank.

Let’s see how the measure works in action;

Create a sample report with going to Report tab, and create a slicer with Period. Also add a Card visualization with Selected Period measure (created above) as the value. Now if you select items in the slicer  you will see what item is selected in the card visualization.

2016-11-22_07h56_52

This Selected Period measure in the calculation of Sales Amount for selected period. Which we will do it in next steps.

Total Revenue

There are multiple ways of calculating the total revenue for each customer. I use a method that use a new Measure. create a new measure in Data tab, under DimCustomer and name it Total Revenue. write DAX expression below for it;

This DAX expression uses SumX function which will calculate sum of an expression (FactInternetSales[SalesAmount]) on a filtered table. The filtered table in this example is RelatedTable(FactInternetSales). RelatedTable will go through each Customer record in DimCustomer, and considering the existing relationship between DimCustomer and FactInternetSales in the model (As showed in the diagram earlier in this post), it will generate a sub set of FactInternetSales for each customer, which is only the subset that has this customerKey in it. So the result of sum of SalesAmount for this subset will be total revenue for each customer.

Let’s see how this works in action;

Go back to the Report page, and create a Table with Full Name from DimCustomer, and also the new measure Total Revenue

2016-11-22_08h11_00

 

In front of each customer you can see the total revenue for that customer. Now let’s calculate selected period’s revenue.

Total Revenue for the Selected Period in the Slicer

Let’s call this Last Period Revenue. In this last period revenue all we want to calculate is the total sales for each customer in the period of last X days, where X is coming from the selection of period slicer. We can use few functions to help along the way. Here is the measure with full DAX expression. I’ll explain details after;

Let’s Start from Calculate; Calculation function is a simple function in terms of the basics of how it works, and it is also the motherhood of all functions in DAX, there are heaps of calculations you can do with this function. How this works? Calculate simply calculates an expression (first parameter) on a set of filters (second parameter, third parameter….). In the example above the expression is Sum(FactInternetSales[SalesAmount]), and the filter is all the parameters after that. In above expression you can see that we have only one filter which is DatesBetween. So calculate simply resolve sum of SalesAmount for the subset of data that comes out of DatesBetween function. Let’s see what DatesBetween Does;

DatesBetween is self explanatory function. It will return a subset of data for the dates between start date and end date! It has three parameters; date field, start date, and end date;

In example above end date is LASTDATE(DimDate[FullDateAlternateKey]). This means the maximum date that we have considering what filter context we are in. The main important thing is calculating the start date. Start date is the last date minus selected period. For doing this calculation I used DATEADD function which reduce number of days with (-1*[Selected Period]) in the highlighted line above in the expression.

As a result this whole DAX expression returns total sales amount only for the selected period. let’s see how it works in the report;

Go back to the report and add the new measure to table.

2016-11-22_08h23_28

Last Period Revenue by default when no period is selected in the slicer will be blank. When you select values, this will bring total revenue for that number of days from the last date in DimDate. In my sample data set I can only start something if I select last 1461 days which is last 4 years. In different version of AdventureWorksDW you might get different results because their data is different. Now let’s find what customers are lost.

Lost Customer(s) for the Selected Period

What table above says is that for a selected period, some customers have not purchased anything. For example Aaron Allen has the total revenue of #3,400 but in the last 1461 days he didn’t purchased anything. So what this says is that this customer is lost in the selected period! So Lost Customer(s) calculation is as simple as this:

This is a simple IF expression which says if customer has a Total Revenue greater than zero, and did not purchased anything in the last period (Last Period Revenue is less than or equal to zero), then this customer is lost.

New Customer(s) for the Selected Period

New customers on the other hand are customers who only purchased in the last period. This means their Total Revenue is equal to Last Period Revenue, and greater than zero. Here is the calculation;

Let’s see the final result in the report; Add Lost Customers and New Customers to the table and you will see the result;

2016-11-22_08h35_46

As you can see our formula successfully determined that Aaron Allen is lost in the last 1461 days period (if you increase or decrease the period you will see different result). Also those customers which their total revenue is equal to last period revenue are considered as New Customers (that’s why you see so many new customers in above table). Note that a customer might not fall in any of these categories. for example;

2016-11-22_08h38_16

Aaron Li and Aaron Nelson are neither lost or new customers. They are customers who purchased before this period, and still purchased in the last period as well.

Bringing Other Dimensions

These measures also works if you bring another table which is related to other tables in the model as a new dimension. For example you can bring Product as another slicer and then you will see product by product which customers are lost and which are new customers. Hope this calculations helps you to expand customer retention in your Power BI Model.

 AlSave

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

26 Comments

  • Hi Their,

    Its really a nice blog and a option I was looking from last quite some time. But here I have one doubt as not able to see any relation between Period(Table create in Power BI) with DIM_DATE,
    so how its going to work without mapping on selection of last 30 days, 60 days and all.

    Thank you.

    Regards
    Ashish Gupta

    • Hi Ashish,
      Thanks for your feedback.
      You are absolutely right. there is no relation between Period and any other table in the model.
      However the measure “Last Period Revenue” calculates based on the selection result of the Period slicer. As you can read in this post I have created a “Selected Period” measure that returns value of selected item (30, 60, or anything else). then “last Period Revenue” calculates based on that with DateAdd function inside DatesBetween. Remember that this is a measure and measures calculates on the fly with every change in the filter context. So this is how it works.

      Cheers
      Reza

    • Good question Igor,
      It is a bit hidden! in your slicer click on three dots button on top right hand side corner and in the popup menu you will see it. It should be more obvious I know and told this as feedback to Microsoft, and I believe they are working on that.
      Cheers
      Reza

  • Hi Raja,
    Excellent article! hope to see more solutions like this one for other practical scenarios.

    One question in mind, can you please explain why for the Total Revenue you used SUMX with RELATEDTABLE, it seems that a simple SUM(‘FactInternetSales'[SalesAmount]) gets the same results.

    thanks

    • Hi Ariel.
      Thanks for your kind feedback.
      You are correct. In this case it would work the same. and simple sum might be better. SUMX came from the original solution where I’ve added that as a column to the customer table itself to have the calculation at the refresh time rather than runtime. That method won’t support total revenue for each product though. but if we want to do only total for each customer that might be better. That said in this example I haven’t even created that calculated column here, so all used measures for calculation. So long story short, you are absolutely right, in this example both are the same, in some other cases there is a slight difference.
      Cheers
      Reza

  • hi reza and best wishes for 2017
    I’ve tried to reproduce your example in a power bi file.
    But when iam trying to create ‘Selected Period’ measured value placed on Period table part (i’ve renamed ‘Period’ table as ‘Period_data’) following warning message is appearing :
    “single value for column ‘Period’ in table ‘Period_data’ cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.”
    I’ve also tried to locate ‘Selected Period’ measured value on Fact table area but result is the same.
    It’s a blocking point as when i try to use ‘Selected Period’ measured value for populating Card visualization but it’s impossible as i get following error message :
    ‘Error Message:Something’s wrong with one or more fields: (Period_data) Selected Period’
    Many thanks in advance for your explanation on this issue.

    By the way DAX formula for ‘Selected Period’ works with ‘;’ as parameter separator not with ‘,’ as mentioned in your source code.

      • hi again reza and thanks for your feedback
        Here is the content of DAX formula of SelectedPeriod measure
        SelectedPeriod = IF(COUNTROWS(Period_data)=1;MAX(Period_data[Period];BLANK()))

        Initially i copy paste your formula but with , as separator i’ve got following message ‘The syntax for ‘MAX’ is incorrect. (DAX(IF(COUNTROWS(Period_data)=1.MAX(Period_data[Period],BLANK())))).’
        When i replaced ‘,’ by ‘;’ as separator, autocomplete menu appeared for each expression typed in each parameter entered which was not the case when i’ve used ‘,’ as separator.
        If you prefer i can send you the .pbix file created but for each source tables from AdventureWorksDW2012 source db i’ve used direct query mode instead of import data (except for Period Data table for which i’ve imported a .csv file).

        • Hi Stephane,
          The problem is that closing bracket for MAX(Period_data[Period] is not closed right after that in your expression, it is closed after blank(). Here is the correct expression for you:
          SelectedPeriod = IF(COUNTROWS(Period_data)=1,MAX(Period_data[Period]),BLANK())
          Cheers
          Reza

  • Hi Reza,

    Thank you for a very helpful use case. I am working on a similar task regarding instead lost sales and new sales, and i was wondering if you could give me some tips.

    I am working on getting the amount of new sales and lost sales with regards to sales previous year and sales year to date. I am trying to show this in a table and then filtering that table with a year slicer.
    Below are the formulas that i have used:

    SalesPY = CALCULATE(SUM(SalesData[Value]),SAMEPERIODLASTYEAR(‘Calendar'[DateKey]))

    SalesYTD = TOTALYTD(SUM(SalesData[Value]), ‘Calendar'[DateKey])

    NewSalesUpdate = SUMX(VALUES(SalesData[CustomerName]),IF([SalesYTD] > 0 && [SalesPY] = 0, [SalesYTD]))

    LostSalesUpdate = SUMX(VALUES(SalesData[CustomerName]),IF([SalesYTD] = 0 && [SalesPY] > 0, -[SalesPY]))

    LostSalesOld = IF([SalesPY] > 0 && [SalesYTD] = 0, -[SalesPY])

    The NewSalesUpdate formula works as it should and sums up correctly. However LostSalesUpdate does not work, despite having pretty much the opposite formula compared with NewSalesUpdate. It seems like the IF statement never becomes true. That is strange because the LostSalesOld formula shows the right value, but it does not show the total.

    All tips are appreciated!

    • Hi Fred,
      Have you checked to see if SalesYTD and SalesPY returning correct values as expected?
      Also the IF statement for LostSalesUpdate is not exactly opposite of NewSalesUpdate. If you want that to be exact opposite do this: IF([SalesYTD]>0 || [SalesPY]=0,…). When you use && for an operation, the opposite of that would be not (&&), which will be similar to (not first condition || not second condition). This will also include having some records that you don’t have a sales for them in this year, and even in last year.
      However I think the problem still might be in the way that calculation of PY and YTD works. Let me know if this still doesn’t help and you need to go more into the details.
      Cheers
      Reza

      Cheers
      Reza

  • Hi Reza,

    Nice one. I am currently exploring your book – congrats for your work.
    Regarding this particular case, isn’t worth calculating the first purchase date ever in a calc. column (in the Customers Table) ?

    1. In the Customers Table, I am computing the first purchase date ever in this calc.column :
    = Calculate ( Min (FactTable[OrderDate] ) )

    2. I am creating a relationship (inactive) between Calendar[Date] and Customers[F.PurchaseDateEver].

    3. I can then leverage UseRelationship function.
    New Customers = Calculate ( [NumOfCustomers] , USERELATIONSHIP(Customers[DateofFirstPurchaseEver] , ‘Calendar'[Date]) )
    Customers are either returning or new.
    So NumOfCustomers = New Customers + Returning Customers (**)
    Thanks to (**) :
    Returning Customers = New Customers – NumOfCustomers

    What do you think about this approach ? Might be even better to calculate First Purchase Date ever through Power Query rather than DAX Calc. Column !

    • Hi Tristan,
      Nice solution, Thanks!
      However with this method you need to consider all other possibilities. If we bring other dimensions, such as Product, then with Power Query you need to consider first purchase date for every product for every customer. If another dimension comes in, we need to change the way of calculation. it is best to be done in DAX side. Which can be done with Calculated column as you mentioned, or measures. Good thing about DAX calculations is that there are always more than one way to achieve the result :)

      Cheers
      Reza

  • Hi Reza,
    I am new to DAX. Thanks for the nice article.
    In the above example, while calculating the measure Last period Revenue, the SUM(FactInternetSales[SalesAmount]) is calculated between last date and last date-selected period. But how it is related to customer? Last period Revenue measure do not have any field from Dimcustomer table right? How come the formula is calculating the Last period revenue for each customer from this measure then? Please explain.

    Thanks,
    PBI

    • Hi,
      For answering that you need to know about FILTER CONTEXT in DAX. Filter context will affect all your measure’s calculation.
      In the table that I put in the report visualization, customer name is listed. that means customer name is filter context. so that means calculation will be done based on that.
      Cheers
      Reza

  • Hi Reza,
    I am a reader of your great e-book from rookie to rockstar. Is this method can be use to employee turnover. I mean, how we count in current month the number of new employee, and the number of lost employee based on monthly employee data. I hope you can understand what I mean.

    Many thanks,
    Raden

    • Hi Raden,
      You can use similar approach.
      Just instead of Sum(Sales) in example above, you need to get COUNTX of employees by their months of joining.
      Cheers
      Reza

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">