One of the sample scenarios that DAX can be used as a calculation engine is customer retention. In the area of customer retention businesses might be interested to see who there lost customers or new customers are in the specific period. Obviously this can be calculated in the Power Query or in the data source level, however if we want to keep the period dynamic then DAX is the great asset to do it. In this post I’ll explain how you can write a simple DAX calculation to find new customers and lost customers over a dynamic period of time in Power BI. If you are interested to learn more about Power BI read the Power BI online book; from Rookie to Rock Star.
Before start let’s define terms that we use here. There might be different definitions for Lost or New Customers, the definition that I use is the very simple definition that I have seen some business use to easily measure their customer retention.
New Customer(s): Any customer who purchased any product in the last specified period of time considering the fact that the customer didn’t purchased anything from this business before this period.
Lost Customer(s): Any customer who haven’t purchased any product in the last specified period of time considering the fact that he/she previously purchased from this business.
The method explained below through an example.
For running this example you need to have AdventureWorksDW database or the Excel version of it.
In the AdventureWorksDW, we want to find out who our new customers and also lost customers are. We want to have the flexibility to select the period (that determines if customer is lost or new) in the slicer. We need these tables in AdventureWorksDW; FactInternetSales, DimDate, DimCustomer, and DimProduct.
Let’s start by getting data from AdventureWorksDW SQL Server database. Tables that we need for this model are: FactInternetSales, DimDate, DimCustomer, and DimProduct. You don’t need to apply any Power Query transformation. Just load the tables in the Power BI. and check the relationship to be as below. Note that relationship between DimDate and FactInternetSales is only one active relationship based on OrderDateKey (remove other relationships between these two tables).
Also we need to have a Period Slicer. This is for being able to dynamically select the period (for example last 30 days, or last 365 days). You can create this table any where, in SQL Server db. in Power Query, in Excel…. I have created that in my model itself through Enter Data. Here is how the table looks like;
Table Name is Period, and the column name is also Period.
What I do for the calculation is to calculate the total revenue from each customer first. This is the total revenue regardless of period. Means sum of SalesAmount for everything that this customer bought for all times (I call this Total Revenue). Then I create another measure to calculate sum of SalesAmount only for the selected period (in Days), Let’s call this Last Period Revenue. Once we have these calculations finding new or lost costumers are easy as below;
Lost Customer(s): Customers that their Total Revenue is greater than zero, but their Last Period Revenue is not.
New Customer(s): Customers that their Total Revenue is equal to their Last Period Revenue, and greater than zero.
The reason for greater than zero condition is that in the database we might have some customer records that haven’t purchased anything at all (seems odd, but we have to consider it, as there might be some exceptions in some business). I do all of steps one by one in a separate measure, this is just to give you an understanding of the process. Obviously these all can be done in one measure.
To start the very first thing is to identify the selected value in the slicer through measure.
Measure for the Selected Value in the Slicer: Selected Period
First things first is to identify what item is selected in the Period slicer. DAX doesn’t understand the SELECTED item, however it understand the CONTEXT in which the DAX expression evaluates. This means; if I select one or more values in the slicer, DAX expression will execute on top of that selection. This selection is the filter context of DAX expression. That’s simple. Now for this particular slicer we want user to always select one item, so it is also good to check if multiple items is selected or not. So let’s create the measure (You can create a measure in the Data Tab, under Modeling, New Measure), named Selected Period under Period table with calculation below (Note that this should be a measure, not calculated column);
Selected Period = IF(COUNTROWS(Period)=1, MAX(Period[Period]), BLANK() )
Expression above simply just check if number of records under the selected context (which will be selected slicer items) is one, then it will fetch the MAX of period. Why maximum when it is only one row? because DAX always works with a set of rows. even though we know it will be one row after passing this condition, DAX still works with a set of rows. the set of rows might have one or more rows. when it has only one row, we can use MIN or MAX or things like that to fetch it as a single value from the set of rows. And finally when multiple items are selected it will return blank.
Let’s see how the measure works in action;
Create a sample report with going to Report tab, and create a slicer with Period. Also add a Card visualization with Selected Period measure (created above) as the value. Now if you select items in the slicer you will see what item is selected in the card visualization.
This Selected Period measure in the calculation of Sales Amount for selected period. Which we will do it in next steps.
There are multiple ways of calculating the total revenue for each customer. I use a method that use a new Measure. create a new measure in Data tab, under DimCustomer and name it Total Revenue. write DAX expression below for it;
Total Revenue = SUMX(RELATEDTABLE(FactInternetSales),FactInternetSales[SalesAmount])
This DAX expression uses SumX function which will calculate sum of an expression (FactInternetSales[SalesAmount]) on a filtered table. The filtered table in this example is RelatedTable(FactInternetSales). RelatedTable will go through each Customer record in DimCustomer, and considering the existing relationship between DimCustomer and FactInternetSales in the model (As showed in the diagram earlier in this post), it will generate a sub set of FactInternetSales for each customer, which is only the subset that has this customerKey in it. So the result of sum of SalesAmount for this subset will be total revenue for each customer.
Let’s see how this works in action;
Go back to the Report page, and create a Table with Full Name from DimCustomer, and also the new measure Total Revenue
In front of each customer you can see the total revenue for that customer. Now let’s calculate selected period’s revenue.
Total Revenue for the Selected Period in the Slicer
Let’s call this Last Period Revenue. In this last period revenue all we want to calculate is the total sales for each customer in the period of last X days, where X is coming from the selection of period slicer. We can use few functions to help along the way. Here is the measure with full DAX expression. I’ll explain details after;
Last Period Revenue = CALCULATE( SUM(FactInternetSales[SalesAmount]), DATESBETWEEN( DimDate[FullDateAlternateKey], DATEADD(LASTDATE(DimDate[FullDateAlternateKey]),-1*[Selected Period],DAY), LASTDATE(DimDate[FullDateAlternateKey]) ) )
Let’s Start from Calculate; Calculation function is a simple function in terms of the basics of how it works, and it is also the motherhood of all functions in DAX, there are heaps of calculations you can do with this function. How this works? Calculate simply calculates an expression (first parameter) on a set of filters (second parameter, third parameter….). In the example above the expression is Sum(FactInternetSales[SalesAmount]), and the filter is all the parameters after that. In above expression you can see that we have only one filter which is DatesBetween. So calculate simply resolve sum of SalesAmount for the subset of data that comes out of DatesBetween function. Let’s see what DatesBetween Does;
DatesBetween is self explanatory function. It will return a subset of data for the dates between start date and end date! It has three parameters; date field, start date, and end date;
DATESBETWEEN( DimDate[FullDateAlternateKey], DATEADD(LASTDATE(DimDate[FullDateAlternateKey]),-1*[Selected Period],DAY), LASTDATE(DimDate[FullDateAlternateKey]) )
In example above end date is LASTDATE(DimDate[FullDateAlternateKey]). This means the maximum date that we have considering what filter context we are in. The main important thing is calculating the start date. Start date is the last date minus selected period. For doing this calculation I used DATEADD function which reduce number of days with (-1*[Selected Period]) in the highlighted line above in the expression.
As a result this whole DAX expression returns total sales amount only for the selected period. let’s see how it works in the report;
Go back to the report and add the new measure to table.
Last Period Revenue by default when no period is selected in the slicer will be blank. When you select values, this will bring total revenue for that number of days from the last date in DimDate. In my sample data set I can only start something if I select last 1461 days which is last 4 years. In different version of AdventureWorksDW you might get different results because their data is different. Now let’s find what customers are lost.
Lost Customer(s) for the Selected Period
What table above says is that for a selected period, some customers have not purchased anything. For example Aaron Allen has the total revenue of #3,400 but in the last 1461 days he didn’t purchased anything. So what this says is that this customer is lost in the selected period! So Lost Customer(s) calculation is as simple as this:
Lost Customers = IF([Total Revenue]>0 && [Last Period Revenue]<=0 ,1 ,0)
This is a simple IF expression which says if the customer has a Total Revenue greater than zero, and did not purchased anything in the last period (Last Period Revenue is less than or equal to zero), then this customer is lost.
New Customer(s) for the Selected Period
New customers on the other hand are customers who only purchased in the last period. This means their Total Revenue is equal to Last Period Revenue, and greater than zero. Here is the calculation;
New Customers = IF([Total Revenue]>0 && [Last Period Revenue]=[Total Revenue] ,1 ,0)
Let’s see the final result in the report; Add Lost Customers and New Customers to the table and you will see the result;
As you can see our formula successfully determined that Aaron Allen is lost in the last 1461 days period (if you increase or decrease the period you will see the different result). Also, those customers which their total revenue is equal to last period revenue are considered as New Customers (that’s why you see so many new customers in the above table). Note that a customer might not fall in any of these categories. for example;
Aaron Li and Aaron Nelson are neither lost or new customers. They are customers who purchased before this period and still purchased in the last period as well.
Bringing Other Dimensions
These measures also work if you bring another table which is related to other tables in the model as a new dimension. For example, you can bring Product as another slicer and then you will see product by product which customers are lost and which are new customers. Hope this calculation helps you to expand customer retention in your Power BI Model.
You can watch the video of this blog post here: