One of the sample scenarios that DAX can be used as a calculation engine is customer retention. In the area of customer retention businesses might be interested to see who there lost customers or new customers are in the specific period. Obviously this can be calculated in the Power Query or in the data source level, however if we want to keep the period dynamic then DAX is the great asset to do it. In this post I’ll explain how you can write a simple DAX calculation to find new customers and lost customers over a dynamic period of time in Power BI. If you are interested to learn more about Power BI read the Power BI online book; from Rookie to Rock Star.
Defining Terms
Before start let’s define terms that we use here. There might be different definitions for Lost or New Customers, the definition that I use is the very simple definition that I have seen some business use to easily measure their customer retention.
New Customer(s): Any customer who purchased any product in the last specified period of time considering the fact that the customer didn’t purchased anything from this business before this period.
Lost Customer(s): Any customer who haven’t purchased any product in the last specified period of time considering the fact that he/she previously purchased from this business.
The method explained below through an example.
Prerequisite
For running this example you need to have AdventureWorksDW database or the Excel version of it.
Scenario
In the AdventureWorksDW, we want to find out who our new customers and also lost customers are. We want to have the flexibility to select the period (that determines if customer is lost or new) in the slicer. We need these tables in AdventureWorksDW; FactInternetSales, DimDate, DimCustomer, and DimProduct.
The Model
Let’s start by getting data from AdventureWorksDW SQL Server database. Tables that we need for this model are: FactInternetSales, DimDate, DimCustomer, and DimProduct. You don’t need to apply any Power Query transformation. Just load the tables in the Power BI. and check the relationship to be as below. Note that relationship between DimDate and FactInternetSales is only one active relationship based on OrderDateKey (remove other relationships between these two tables).
Period Slicer
Also we need to have a Period Slicer. This is for being able to dynamically select the period (for example last 30 days, or last 365 days). You can create this table any where, in SQL Server db. in Power Query, in Excel…. I have created that in my model itself through Enter Data. Here is how the table looks like;
Table Name is Period, and the column name is also Period.
Method
What I do for the calculation is to calculate the total revenue from each customer first. This is the total revenue regardless of period. Means sum of SalesAmount for everything that this customer bought for all times (I call this Total Revenue). Then I create another measure to calculate sum of SalesAmount only for the selected period (in Days), Let’s call this Last Period Revenue. Once we have these calculations finding new or lost costumers are easy as below;
Lost Customer(s): Customers that their Total Revenue is greater than zero, but their Last Period Revenue is not.
New Customer(s): Customers that their Total Revenue is equal to their Last Period Revenue, and greater than zero.
The reason for greater than zero condition is that in the database we might have some customer records that haven’t purchased anything at all (seems odd, but we have to consider it, as there might be some exceptions in some business). I do all of steps one by one in a separate measure, this is just to give you an understanding of the process. Obviously these all can be done in one measure.
To start the very first thing is to identify the selected value in the slicer through measure.
Measure for the Selected Value in the Slicer: Selected Period
First things first is to identify what item is selected in the Period slicer. DAX doesn’t understand the SELECTED item, however it understand the CONTEXT in which the DAX expression evaluates. This means; if I select one or more values in the slicer, DAX expression will execute on top of that selection. This selection is the filter context of DAX expression. That’s simple. Now for this particular slicer we want user to always select one item, so it is also good to check if multiple items is selected or not. So let’s create the measure (You can create a measure in the Data Tab, under Modeling, New Measure), named Selected Period under Period table with calculation below (Note that this should be a measure, not calculated column);
Selected Period = IF(COUNTROWS(Period)=1, MAX(Period[Period]), BLANK() )
Expression above simply just check if number of records under the selected context (which will be selected slicer items) is one, then it will fetch the MAX of period. Why maximum when it is only one row? because DAX always works with a set of rows. even though we know it will be one row after passing this condition, DAX still works with a set of rows. the set of rows might have one or more rows. when it has only one row, we can use MIN or MAX or things like that to fetch it as a single value from the set of rows. And finally when multiple items are selected it will return blank.
Let’s see how the measure works in action;
Create a sample report with going to Report tab, and create a slicer with Period. Also add a Card visualization with Selected Period measure (created above) as the value. Now if you select items in the slicer you will see what item is selected in the card visualization.
This Selected Period measure in the calculation of Sales Amount for selected period. Which we will do it in next steps.
Total Revenue
There are multiple ways of calculating the total revenue for each customer. I use a method that use a new Measure. create a new measure in Data tab, under DimCustomer and name it Total Revenue. write DAX expression below for it;
Total Revenue = SUMX(RELATEDTABLE(FactInternetSales),FactInternetSales[SalesAmount])
This DAX expression uses SumX function which will calculate sum of an expression (FactInternetSales[SalesAmount]) on a filtered table. The filtered table in this example is RelatedTable(FactInternetSales). RelatedTable will go through each Customer record in DimCustomer, and considering the existing relationship between DimCustomer and FactInternetSales in the model (As showed in the diagram earlier in this post), it will generate a sub set of FactInternetSales for each customer, which is only the subset that has this customerKey in it. So the result of sum of SalesAmount for this subset will be total revenue for each customer.
Let’s see how this works in action;
Go back to the Report page, and create a Table with Full Name from DimCustomer, and also the new measure Total Revenue
In front of each customer you can see the total revenue for that customer. Now let’s calculate selected period’s revenue.
Total Revenue for the Selected Period in the Slicer
Let’s call this Last Period Revenue. In this last period revenue all we want to calculate is the total sales for each customer in the period of last X days, where X is coming from the selection of period slicer. We can use few functions to help along the way. Here is the measure with full DAX expression. I’ll explain details after;
Last Period Revenue = CALCULATE( SUM(FactInternetSales[SalesAmount]), DATESBETWEEN( DimDate[FullDateAlternateKey], DATEADD(LASTDATE(DimDate[FullDateAlternateKey]),-1*[Selected Period],DAY), LASTDATE(DimDate[FullDateAlternateKey]) ) )
Let’s Start from Calculate; Calculation function is a simple function in terms of the basics of how it works, and it is also the motherhood of all functions in DAX, there are heaps of calculations you can do with this function. How this works? Calculate simply calculates an expression (first parameter) on a set of filters (second parameter, third parameter….). In the example above the expression is Sum(FactInternetSales[SalesAmount]), and the filter is all the parameters after that. In above expression you can see that we have only one filter which is DatesBetween. So calculate simply resolve sum of SalesAmount for the subset of data that comes out of DatesBetween function. Let’s see what DatesBetween Does;
DatesBetween is self explanatory function. It will return a subset of data for the dates between start date and end date! It has three parameters; date field, start date, and end date;
DATESBETWEEN( DimDate[FullDateAlternateKey], DATEADD(LASTDATE(DimDate[FullDateAlternateKey]),-1*[Selected Period],DAY), LASTDATE(DimDate[FullDateAlternateKey]) )
In example above end date is LASTDATE(DimDate[FullDateAlternateKey]). This means the maximum date that we have considering what filter context we are in. The main important thing is calculating the start date. Start date is the last date minus selected period. For doing this calculation I used DATEADD function which reduce number of days with (-1*[Selected Period]) in the highlighted line above in the expression.
As a result this whole DAX expression returns total sales amount only for the selected period. let’s see how it works in the report;
Go back to the report and add the new measure to table.
Last Period Revenue by default when no period is selected in the slicer will be blank. When you select values, this will bring total revenue for that number of days from the last date in DimDate. In my sample data set I can only start something if I select last 1461 days which is last 4 years. In different version of AdventureWorksDW you might get different results because their data is different. Now let’s find what customers are lost.
Lost Customer(s) for the Selected Period
What table above says is that for a selected period, some customers have not purchased anything. For example Aaron Allen has the total revenue of #3,400 but in the last 1461 days he didn’t purchased anything. So what this says is that this customer is lost in the selected period! So Lost Customer(s) calculation is as simple as this:
Lost Customers = IF([Total Revenue]>0 && [Last Period Revenue]<=0 ,1 ,0)
This is a simple IF expression which says if the customer has a Total Revenue greater than zero, and did not purchased anything in the last period (Last Period Revenue is less than or equal to zero), then this customer is lost.
New Customer(s) for the Selected Period
New customers on the other hand are customers who only purchased in the last period. This means their Total Revenue is equal to Last Period Revenue, and greater than zero. Here is the calculation;
New Customers = IF([Total Revenue]>0 && [Last Period Revenue]=[Total Revenue] ,1 ,0)
Let’s see the final result in the report; Add Lost Customers and New Customers to the table and you will see the result;
As you can see our formula successfully determined that Aaron Allen is lost in the last 1461 days period (if you increase or decrease the period you will see the different result). Also, those customers which their total revenue is equal to last period revenue are considered as New Customers (that’s why you see so many new customers in the above table). Note that a customer might not fall in any of these categories. for example;
Aaron Li and Aaron Nelson are neither lost or new customers. They are customers who purchased before this period and still purchased in the last period as well.
Bringing Other Dimensions
These measures also work if you bring another table which is related to other tables in the model as a new dimension. For example, you can bring Product as another slicer and then you will see product by product which customers are lost and which are new customers. Hope this calculation helps you to expand customer retention in your Power BI Model.
Video
You can watch the video of this blog post here:
Hi Their,
Its really a nice blog and a option I was looking from last quite some time. But here I have one doubt as not able to see any relation between Period(Table create in Power BI) with DIM_DATE,
so how its going to work without mapping on selection of last 30 days, 60 days and all.
Thank you.
Regards
Ashish Gupta
Hi Ashish,
Thanks for your feedback.
You are absolutely right. there is no relation between Period and any other table in the model.
However the measure “Last Period Revenue” calculates based on the selection result of the Period slicer. As you can read in this post I have created a “Selected Period” measure that returns value of selected item (30, 60, or anything else). then “last Period Revenue” calculates based on that with DateAdd function inside DatesBetween. Remember that this is a measure and measures calculates on the fly with every change in the filter context. So this is how it works.
Cheers
Reza
Thank you really use full, I implemented in one of my project and its working fine.
Thanks Ashish for your kind words 🙂
Cheers
Reza
Reza, how did you get to add that search box on the EnglishProductName slicer?
Good question Igor,
It is a bit hidden! in your slicer click on three dots button on top right hand side corner and in the popup menu you will see it. It should be more obvious I know and told this as feedback to Microsoft, and I believe they are working on that.
Cheers
Reza
See http://www.daxpatterns.com/new-and-returning-customers/ for an exhaustive discussion of this subject.
Hi Raja,
Excellent article! hope to see more solutions like this one for other practical scenarios.
One question in mind, can you please explain why for the Total Revenue you used SUMX with RELATEDTABLE, it seems that a simple SUM(‘FactInternetSales'[SalesAmount]) gets the same results.
thanks
Hi Ariel.
Thanks for your kind feedback.
You are correct. In this case it would work the same. and simple sum might be better. SUMX came from the original solution where I’ve added that as a column to the customer table itself to have the calculation at the refresh time rather than runtime. That method won’t support total revenue for each product though. but if we want to do only total for each customer that might be better. That said in this example I haven’t even created that calculated column here, so all used measures for calculation. So long story short, you are absolutely right, in this example both are the same, in some other cases there is a slight difference.
Cheers
Reza
Hi Reza,
Extremely helpful use case; thank you as always for your contribution to the data science community.
Cheers,
Sam Lexington
Hi Sam.
Thanks for your kind feedback 🙂
Cheers
Reza
hi reza and best wishes for 2017
I’ve tried to reproduce your example in a power bi file.
But when iam trying to create ‘Selected Period’ measured value placed on Period table part (i’ve renamed ‘Period’ table as ‘Period_data’) following warning message is appearing :
“single value for column ‘Period’ in table ‘Period_data’ cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.”
I’ve also tried to locate ‘Selected Period’ measured value on Fact table area but result is the same.
It’s a blocking point as when i try to use ‘Selected Period’ measured value for populating Card visualization but it’s impossible as i get following error message :
‘Error Message:Something’s wrong with one or more fields: (Period_data) Selected Period’
Many thanks in advance for your explanation on this issue.
By the way DAX formula for ‘Selected Period’ works with ‘;’ as parameter separator not with ‘,’ as mentioned in your source code.
Hi Stephane,
Happy New Year. Can you send me DAX code of your measures? you can send it to my email reza at radacad dot com, or here in comments.
Cheers
Reza
hi again reza and thanks for your feedback
Here is the content of DAX formula of SelectedPeriod measure
SelectedPeriod = IF(COUNTROWS(Period_data)=1;MAX(Period_data[Period];BLANK()))
Initially i copy paste your formula but with , as separator i’ve got following message ‘The syntax for ‘MAX’ is incorrect. (DAX(IF(COUNTROWS(Period_data)=1.MAX(Period_data[Period],BLANK())))).’
When i replaced ‘,’ by ‘;’ as separator, autocomplete menu appeared for each expression typed in each parameter entered which was not the case when i’ve used ‘,’ as separator.
If you prefer i can send you the .pbix file created but for each source tables from AdventureWorksDW2012 source db i’ve used direct query mode instead of import data (except for Period Data table for which i’ve imported a .csv file).
Hi Stephane,
The problem is that closing bracket for MAX(Period_data[Period] is not closed right after that in your expression, it is closed after blank(). Here is the correct expression for you:
SelectedPeriod = IF(COUNTROWS(Period_data)=1,MAX(Period_data[Period]),BLANK())
Cheers
Reza
Many thanks for your help reza. I need to be more carefull with brackets positionning 😉
Thanks Stephane,
Always glad to help
Cheers
Reza
Hi Reza,
Thank you for a very helpful use case. I am working on a similar task regarding instead lost sales and new sales, and i was wondering if you could give me some tips.
I am working on getting the amount of new sales and lost sales with regards to sales previous year and sales year to date. I am trying to show this in a table and then filtering that table with a year slicer.
Below are the formulas that i have used:
SalesPY = CALCULATE(SUM(SalesData[Value]),SAMEPERIODLASTYEAR(‘Calendar'[DateKey]))
SalesYTD = TOTALYTD(SUM(SalesData[Value]), ‘Calendar'[DateKey])
NewSalesUpdate = SUMX(VALUES(SalesData[CustomerName]),IF([SalesYTD] > 0 && [SalesPY] = 0, [SalesYTD]))
LostSalesUpdate = SUMX(VALUES(SalesData[CustomerName]),IF([SalesYTD] = 0 && [SalesPY] > 0, -[SalesPY]))
LostSalesOld = IF([SalesPY] > 0 && [SalesYTD] = 0, -[SalesPY])
The NewSalesUpdate formula works as it should and sums up correctly. However LostSalesUpdate does not work, despite having pretty much the opposite formula compared with NewSalesUpdate. It seems like the IF statement never becomes true. That is strange because the LostSalesOld formula shows the right value, but it does not show the total.
All tips are appreciated!
Hi Fred,
Have you checked to see if SalesYTD and SalesPY returning correct values as expected?
Also the IF statement for LostSalesUpdate is not exactly opposite of NewSalesUpdate. If you want that to be exact opposite do this: IF([SalesYTD]>0 || [SalesPY]=0,…). When you use && for an operation, the opposite of that would be not (&&), which will be similar to (not first condition || not second condition). This will also include having some records that you don’t have a sales for them in this year, and even in last year.
However I think the problem still might be in the way that calculation of PY and YTD works. Let me know if this still doesn’t help and you need to go more into the details.
Cheers
Reza
Cheers
Reza
Hi Reza,
Nice one. I am currently exploring your book – congrats for your work.
Regarding this particular case, isn’t worth calculating the first purchase date ever in a calc. column (in the Customers Table) ?
1. In the Customers Table, I am computing the first purchase date ever in this calc.column :
= Calculate ( Min (FactTable[OrderDate] ) )
2. I am creating a relationship (inactive) between Calendar[Date] and Customers[F.PurchaseDateEver].
3. I can then leverage UseRelationship function.
New Customers = Calculate ( [NumOfCustomers] , USERELATIONSHIP(Customers[DateofFirstPurchaseEver] , ‘Calendar'[Date]) )
Customers are either returning or new.
So NumOfCustomers = New Customers + Returning Customers (**)
Thanks to (**) :
Returning Customers = New Customers – NumOfCustomers
What do you think about this approach ? Might be even better to calculate First Purchase Date ever through Power Query rather than DAX Calc. Column !
Hi Tristan,
Nice solution, Thanks!
However with this method you need to consider all other possibilities. If we bring other dimensions, such as Product, then with Power Query you need to consider first purchase date for every product for every customer. If another dimension comes in, we need to change the way of calculation. it is best to be done in DAX side. Which can be done with Calculated column as you mentioned, or measures. Good thing about DAX calculations is that there are always more than one way to achieve the result 🙂
Cheers
Reza
Hi Reza,
I am new to DAX. Thanks for the nice article.
In the above example, while calculating the measure Last period Revenue, the SUM(FactInternetSales[SalesAmount]) is calculated between last date and last date-selected period. But how it is related to customer? Last period Revenue measure do not have any field from Dimcustomer table right? How come the formula is calculating the Last period revenue for each customer from this measure then? Please explain.
Thanks,
PBI
Hi,
For answering that you need to know about FILTER CONTEXT in DAX. Filter context will affect all your measure’s calculation.
In the table that I put in the report visualization, customer name is listed. that means customer name is filter context. so that means calculation will be done based on that.
Cheers
Reza
Hi Reza,
I am a reader of your great e-book from rookie to rockstar. Is this method can be use to employee turnover. I mean, how we count in current month the number of new employee, and the number of lost employee based on monthly employee data. I hope you can understand what I mean.
Many thanks,
Raden
Hi Raden,
You can use similar approach.
Just instead of Sum(Sales) in example above, you need to get COUNTX of employees by their months of joining.
Cheers
Reza
Hi Reza
Thank you really use full
Hi Reza,
I really like your solutions and your explenations on this matter. But I have a question. If want to implement another rule for when a customer is considered new. Say for example that a customer is considered a new customer once he/she has made at least 2 purchases? Is that possible? How would i go about setting this up?
Cheers,
Dan
Hi Dan,
Yes it is possible.
In that case instead of calculating Sum of sales in the Calculate, you can calculate Count of transactions within calculate, if that is more than 2, then you consider it, otherwise not.
Cheers
Reza
Hi Reza,
Thank you, it worked like a charm! 🙂
Cheers,
Dan
Thanks very much Reza,
You have helped me alot with your blogs.
Cheers.
Arthur
Hi Reza,
Thank you for this great explanation!
Is it possible to count lost customers ? If so, how we can do that?
Cheers,
Hi Lucas
Thanks for your comment.
Yes, absolutely possible. You can use Iterator functions such as SUMX to calculate row by row, and then you will get the total. or you can use the same approach mentioned here, with few changes to see if nothing is filtered using ISFILTERED and then calculate the count of lost customers. I may write about it in the future.
Cheers
Reza
Hi Rad,
I have a problem here. In your example, you brought in the customer name, sales, last sales and then the measure of new/lost customers and then it shows 1 or 0. That’s fine so far for me, it’s giving correct result.
Now, i want to find out the total no. of new buyers based on month or quarter selection. That means in my filter i will select as an example december and i only need to get total no of. new buyers. I will not drag in any customer name or revenue into the table.
Also how do i sum up those achieved 1’s in your table. Please eagerly waiting.
Hi Naveen.
There are multiple methods in which you can calculate the aggregated total. Here is my latest blog article, which is the next part of this current post actually, I explained how you can do the total of lost vs new customers.
Cheers
Reza
Found the sales for last and new buyers also. New Buyer sales:=calculate(total sales,filter(values(customer name),new buyer)
Thank you! This really helps.