/ Predictive Analytics

What every marketer needs to know about customer lifetime value

Who are your most valuable customers?

Your answer to this question greatly affects your customer acquisition strategy, which in turns affects where most of your marketing dollars are spent.

That is why customer lifetime value (CLV) is an important metric you cannot ignore. Getting this metric right could mean night and day for your business.

CLV comes to play whether you are identifying your most valuable customer segment, testing the returns on your marketing campaigns, or simply projecting future revenue,

Our guide to lifecycle marketing has examined some of the ways you can use CLV to make smarter marketing decisions, and in this article, we will discuss more the process behind computing CLV.

While a simple concept, CLV can be calculated in a variety of ways, ranging in sophistication and accuracy.

We hope you will walk away from this article with 2 takeaways:

  • Knowing how to estimate CLV using basic methods
  • Understanding how predictive models differ from historical models, and their potential application

What is CLV?

Simply put, CLV tells you how much each customer is worth to your businesses.

Technically speaking, CLV is the present value of cash flows expected from a customer. In order words, it is the revenue you can expect when you acquire a customer.

While CLV is the monetary benefit of acquiring customers, Customer Acquisition Cost (CAC) is the monetary cost of acquiring customers.

CAC can be as low as zero if you acquire customers by asking your friends to spread the word, or it can be as high as whatever you pay Google or Facebook for advertisements.

You always want to keep CLV higher than CAC, so that you are earning money from acquiring customers.

Overview of methods

There are many methods for calculating CLV differing in simplicity and complexity.

The first 2 models are historical models, which means that their estimates of CLV are based on extrapolating historical purchasing data. The Naive Bayes and Pareto/NBD models, on the other hand, are predictive models that aim to forecast how much consumers will consume in the future. Given that CLV is a measure of future cash flows, predictive models typically provide a more accurate estimate of CLV.

Predictive CLV can be 25% more accurate than historical CLV methods.

The table below sets forth the different methods:

Summary Pros Cons
Average revenue per user CLV per year equals historical average revenue per user * number of users * 12 months Easy to calculate Backward looking, assumes that past performance will reflect future performance
Cohort Analysis Improves on ARPU by also considering how long people have been customers Easy to calculate, takes into account the fact that customers tend to buy less over time. Can see CLV trend over time Backward looking, assumes that past performance will reflect future performance
Naive Bayes Uses Bayes Theorem to find most likely CLV given traits of new customer Is a predictive model that is not simply based off past trends. Provides a better estimate of future purchasing patterns Assumption of independence between variables contradicts how real world operates
Pareto/NBD Uses poisson, exponential, and gamma distributions to model number of orders, customer lifetime, and order size Is a predictive model that is not simply based off past trends. Provides a better estimate of future purchasing patterns

Considers more variables such as order history, frequency, and recency to form a more complete model of customer behaviour

Takes into account the time value of model (discount rate)

Has been used in practice to achieve significant conversion uplifts in marketing campaigns
Complex to execute, need statistical models.

Average revenue per user

This method is based on the historical average of how much customers have bought. Say you have 2 customers, John and Mary:

John sales Mary sales
Month 1 50 60
Month 2 40 0
Month 3 30 30
Average 40 30

Based on their past purchases, John spends an average of $40 a month, while Mary spends an average of $30 a month. Hence, on average, your customers spend $35 a month.

To calculate the CLV for 12 months, you simply multiply $35 by 12.

Advantages:

  • Easy to calculate

Disadvantages:

  • Assumes that past buying behavior reflects future buying behavior. If your marketing policies improve and cause customers to start buying more, using a historical average would understate your future earnings.

Cohort analysis###

By taking a historical average of monthly revenue, the ARPU method ignores the variance in revenue between different months. Most customers tend to spend less as time passes, a trend which is not reflected by the ARPU method.

To express the point graphically, the spending patterns of a typical customer at shop ABC may follow the red line shown below.

If shop ABC acquired a new customer, the expected revenue from the customer in the first month will be $55. And a one-month-old customer will be expected to spend $50 in his second month. The annual CLV will be equal to the area under the graph for the next 12 months. On the other hand, if the ARPU method is used, regardless of how ‘old’ the customer is, average revenue is assumed to be $25.

Another way cohort analysis is useful is in examining trends in revenue per month. If revenue falls rapidly with time, companies may consider using customer win-back strategies like special promotions to retain customers. We can also use cohort analysis to compare if revenue per month has increased after a marketing campaign has been executed.

Advantages:

  • Easy to calculate
  • Takes into account trends in revenue over time

Disadvantages:

  • Like the ARPU model, it is still based on historical data, so its predictive power is limited

Naive Bayes classifier

The following two models are more complex and involve more difficult statistical analysis. We will discuss the concept of how they work without going too much into the Mathematics.

To apply the Naive Bayes Classifier, you need a database of customer purchase history and your customers’ characteristics. Based on this data, you can predict the CLV of a new customer.

Let us provide a simple example to illustrate the concept. Assume you had 10 customers over the past year, and on top of collecting how much they bought, you also asked them for their gender. Over the past year, you were able to observe their spending, and so you can calculate their 12-month trailing CLV. Imagine that their CLV falls into 3 categories: 1000, 2000, or 3000. You can sort your customers as shown below:

CLV: 1000 CLV: 2000 CLV: 3000
Male Male Female
Male Male Female
Male Female
Female Male

Now, let’s say you acquire a new customer, who is Male. The Naive Bayes Classifier helps you make the best guess for what his CLV will be - 1000, 2000, or 3000?

This is what the Bayes theorem says: given that the customer is male, the probability that his CLV is 1000 equals:

  • The probability that someone is male if their CLV is 1000 (out of people whose CLV equals 1000, ¾ are male, so this value is ¾)...

Multiplied by

  • The probability that a random person’s CLV is 1000 (out of 10 people, 4 have CLV of 1000, so this value is 4/10)...

Divided by

  • The probability that a random person is male (out of 10 people, 6 are male, so this value is 6/10)

The final result equals: 0.75 * 0.4 / 0.6 = 0.5

So based on the data you have, there is a 50% chance that a new male customer has a CLV of 1000. You repeat this process to test the probability that his CLV is 2000 or 3000 and find the most likely CLV (in this case, there is a ⅓ chance that his CLV is 2000, and a ⅙ chance that his CLV is 3000, so the most likely CLV is 1000).

In real life, there are many possible CLVs, so many hypotheses will be formed and tested to see which is the most likely. Additionally, there are many customer traits you can consider apart from gender. The Naive Bayes approach can be repeated for multiple customer traits to find a more accurate prediction of CLV. Adding on to the previous example, if you repeat the process and find that the possibility of the CLV being 1000 given that the customer is aged 18-25 is 30%, there will then be a 15% possibility (0.5 * 0.3) that a new 18-25 year old male customer has a CLV of 1000. You then compare this against the possibility that his CLV is 2000 or 3000 to find the most likely CLV.

The key thing to know about the Naive Bayes Classifier is that it helps you find the most likely CLV of a new customer, by comparing his traits against the traits of your existing customers and their CLVs. As your database of customer information grows over time, the Naive Bayes classifier provides increasingly accurate predictions.

Advantages:

  • Is a predictive model that does not assume past trends reflect future activity.
  • As far as statistical models go, this is relatively easy to implement and is scalable for large amounts of data.

Disadvantages:

  • A key assumption needed for the Bayes Theorem to work is that the two variables are independent. To use our earlier example, we have to assume that gender has no influence on CLV. Based on the data we collected, however, females tend to be in the 3000 CLV bracket - there is clearly some relationship between gender and CLV. This inherent assumption in the Naive Bayes Classifier affects how well it models the real world.

Pareto/NBD model

Like the Naive Bayes Classifier, you need an existing database of customer information to use the Pareto/NBD model. Specifically, we need the following 3 pieces of information:

  1. The frequency of purchases (which is given by the number of transactions made divided by how long the individual has been a customer)
  2. How long ago the last purchase was made
  3. The average monetary value of purchases

These 3 inputs are used to build statistical models of:

  1. Number of purchases within a given period of time
  2. Customer lifetime
  3. Average monetary value of purchases

These statistical models provide a hypothesis of customer activity, and their parameters are adjusted to provide the best fit with our existing database of information.

To calculate CLV for the next 12 months, we use our statistical models to find the expected number of purchases that will be made in the next 12 months, and the expected average value of purchases. 12-month CLV = expected purchases in 12 months * expected average value of purchases / annual discount rate.

The number of purchases made in a given period of time is modeled by the Poisson distribution, which looks like this:

Source: Tmath

By its mathematical properties, the Poisson distribution is good for modeling the number of times an event occurs in an interval of time. The x-axis represents the number of purchases, and the y-axis represents the probability of the purchase happening. So in the graph above, there is 0.18 chance that 5 purchases will be made. The ‘peak’ in the graph is the most likely number of purchases that will be made (in this case, 5). Let us denote this ‘most likely number of purchases’ as lambda. Poisson distributions are described entirely by one parameter, lambda.

The graph above represents the frequency of purchase for one customer, and since customers are heterogeneous, each customer will have a different Poisson distribution. A Gamma distribution is used to model the differences between customers, such that lambda across customers follows a Gamma distribution. Gamma distributions are shaped like this:


Source: Easycalculation

The x-axis represents the value of lambda and the y-axis represents the probability. In the graph above, you can see that most people will have a lambda between 2 and 5. There is no maximum value of lambda, but the probability decreases exponentially as lambda increases.

The gamma distribution is used because it is positive: as seen above, the graph does not stretch past the y-axis. This makes sense because the number of purchases made can only be positive. Additionally, the graph is positively skewed, which means that the mean of the distribution is greater than the median. Colloquially, you can see above that the ‘hump’ of the graph is to the left. This is a good approximation of spending behavior, because most customers will have a relatively lower number of purchases, and the number of customers making a large number of purchases is small and decreases exponentially.

The lifetime of a customer is modeled by a negative exponential distribution, which looks like the graph below:


Source: Metisa

The x-axis represents time, and the y-axis represents the probability that they are still active. As can be seen, as time increases, the probability of a customer being active decreases exponentially. This matches our expectations of customer behavior, making the negative exponential distribution a good model of reality.

However, the rate at which each customer becomes inactive - or the steepness of slope shown above - is unique to each customer. Hence, a Gamma distribution is used to model the differences between customers, in a similar was as was done for the number of purchases. The Gamma distribution is used for the same reasons as mentioned above.

Finally, the expected average profit per purchase is modeled by a Gamma distribution. The differences in the expected average profit between customers is modeled by another Gamma distribution.

Here is quick summary:

Number of purchases made in a period of time Lifetime of customer Expected average profit per purchase
Each customer is modelled by Poisson distribution Negative exponential distribution Gamma distribution
Differences between customers are modelled by Gamma distribution Gamma distribution Gamma distribution

Metisa’s experience with the Pareto/NBD model has shown that it allows marketers to more accurately measure CLV, and target new customers who have small initial order sizes but may be loyal customers with high CLVs. Marketing campaigns structured from insights of the Pareto/NBD model were on average 25% better in identifying and engaging at-risk customers.

Advantages:

  • Is a predictive model that uses existing data to build models that can chart out future activity. Does not assume that past trends can be extrapolated into the future.
  • Statistical models are chosen to best represent the reality of customer behavior. Unlike the Naive Bayes classifier, does not make assumptions of independence between variables.
  • Considers more variables such as order history, frequency, and recency to form a more complete model of customer behavior
  • Takes into account the time value of model (discount rate)
  • Has been used in practice to achieve significant conversion uplifts in marketing campaigns

Disadvantages:

  • Complex to execute, requires expertise and statistical models.

Over to you

To summarize, knowing your customers’ CLV is integral to making the best marketing decisions and knowing which customers to target. There is a host of methods to calculate CLV, ranging in sophistication and accuracy.

If you want to incorporate predictive CLV in your marketing workflow, check out Metisa. If you have a Shopify, Magento or BigCommerce store, you can start for free in 5 minutes or less. If you have a custom store, please reach out for a demo.

Justin Yek

Justin Yek

Partner & cofounder at Altitude Labs, creator of Metisa, former investment banker, public speaker, hobbyist musician

Read More