Estimate consumers lifetime value

Python package lifetimes helps to make recency/frequency customers analysis, and estimate lifetime values. Two models can be used, the simple BG/NBD model that doesn’t use monetary value, and the Gamma-Gamma model that estimates the customer lifetime value.

Prepare data

The initial dataset with one row per transaction (order_id) needs to be transformed to the appropriate shape, with one row per client, and four columns for recency, frequency, age, and monetary value.

# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes
from lifetimes.plotting import plot_frequency_recency_matrix, plot_probability_alive_matrix

# Import list of transactions
df = pd.read_csv('./exclude/eshop_transactions.csv', parse_dates=['date'])
df.head()
order_id client_id date total
0 1007 fe1a03b2b0e021bbac0ea050a1d216a7 2017-06-21 05:14:35 39.0
1 1008 c9bdedeb9ac367f11c77dc5753b2b939 2017-06-21 05:30:19 39.0
2 1009 47f4ef0684413a1f5a429e251cbc7261 2017-06-21 07:58:02 75.0
3 1010 9a4ee011e9639539955423f54b7d46ec 2017-06-21 08:23:34 75.0
4 1011 490037f2983ad86fab82bcb309a33ee1 2017-06-21 08:30:59 42.0
# Transform data to appropriate shape
df_rfm = lifetimes.utils.summary_data_from_transaction_data(
    df, 'client_id', 'date', 
    monetary_value_col = 'total',
    observation_period_end='2019-03-31'
)
df_rfm.head(7)
frequency recency T monetary_value
client_id
0020c81355c2057acfb019eb2c18a9a1 1.0 236.0 619.0 94.5
003b1637ce45163ccb9b0ff278464fd1 0.0 0.0 535.0 0.0
00444e8c950c199b637e5dcdfe401e0c 0.0 0.0 226.0 0.0
0055eb3281238fa3388a6db46d7d2d01 0.0 0.0 163.0 0.0
006e783513f582e9657d18334d06df49 1.0 103.0 634.0 53.0
007062115a3cbd21d0b52af5e42e10cf 0.0 0.0 299.0 0.0
0085fabd8952f67be3efdeaddfbf6d43 2.0 327.0 439.0 94.5

Recency and frequency with BG/NBD model

# Fit BG/NBD model
bgf = lifetimes.BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(df_rfm['frequency'], df_rfm['recency'], df_rfm['T'])
<lifetimes.BetaGeoFitter: fitted with 3408 subjects, a: 0.82, alpha: 141.01, b: 3.52, r: 0.30>
# Expected future purchases given frequency/recency
fig, ax = plt.subplots(figsize=(8, 6))
ax = plot_frequency_recency_matrix(bgf, T=365)

png

# Probability of being alive given frequency/recency
fig, ax = plt.subplots(figsize=(16, 6))
ax = plot_probability_alive_matrix(bgf)

png

Predictions of the number of expected purchases, along with the probability of being alive, can be made for every customer with function conditional_expected_number_of_purchases_up_to_time()

# Predict for every customer

## Number of periods (days) forward to predict the number of purchases
t = 365

## Create prediction dataframe
df_clv = df_rfm.copy()
df_clv['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(
    t, df_rfm['frequency'], df_rfm['recency'], df_rfm['T']
)
df_clv['proba_alive'] = bgf.conditional_probability_alive(
    frequency=df_rfm['frequency'],
    recency=df_rfm['recency'],
    T=df_rfm['T']
)

## Show results
df_clv.head()
frequency recency T monetary_value predicted_purchases proba_alive
client_id
0020c81355c2057acfb019eb2c18a9a1 1.0 236.0 619.0 94.5 0.366430 0.632533
003b1637ce45163ccb9b0ff278464fd1 0.0 0.0 535.0 0.0 0.153914 1.000000
00444e8c950c199b637e5dcdfe401e0c 0.0 0.0 226.0 0.0 0.271410 1.000000
0055eb3281238fa3388a6db46d7d2d01 0.0 0.0 163.0 0.0 0.321769 1.000000
006e783513f582e9657d18334d06df49 1.0 103.0 634.0 53.0 0.277443 0.487711

Monetary value with Gamma-gamma model

The Gamma-gamma model takes the monetary value of customer history into account, and use it to predict the remaining value of the customer lifetime over a given period (12 months in this example).

The model assumes that there is no relationship between the monetary value and the purchase frequency. In practice, we need to check whether the Pearson correlation between the two vectors is close to 0 in order to use this model.

# Keep only returning customers
df_rfm_return = df_rfm[df_rfm['frequency'] > 0]

# Check (absence of) correlation between frequency and monetary value
df_rfm_return[['frequency', 'monetary_value']].corr()
frequency monetary_value
frequency 1.000000 0.021878
monetary_value 0.021878 1.000000
# Fit Gamma-Gamma model
ggf = lifetimes.GammaGammaFitter()
ggf.fit(df_rfm_return['frequency'], df_rfm_return['monetary_value'])
<lifetimes.GammaGammaFitter: fitted with 972 subjects, p: 15.64, q: 9.52, v: 44.88>
# Predict expected value per transaction
df_clv['exp_avg_value'] = ggf.conditional_expected_average_profit(
    df_rfm['frequency'], df_rfm['monetary_value']
)

# Compare with actual average profit
print("Actual average profit: {:.2f}".format(df_rfm_return['monetary_value'].mean()))
print("Predicted expected value: {:.2f}".format(df_clv['exp_avg_value'].mean()))
Actual average profit: 82.45
Predicted expected value: 82.40
# Predict residual customer lifetime value
df_clv['clv'] = ggf.customer_lifetime_value(
    bgf,
    df_rfm['frequency'],
    df_rfm['recency'],
    df_rfm['T'],
    df_rfm['monetary_value'],
    time=12,   # months
)
df_clv.head()
frequency recency T monetary_value predicted_purchases proba_alive exp_avg_value clv
client_id
0020c81355c2057acfb019eb2c18a9a1 1.0 236.0 619.0 94.5 0.366430 0.632533 90.215145 30.654873
003b1637ce45163ccb9b0ff278464fd1 0.0 0.0 535.0 0.0 0.153914 1.000000 82.352228 11.747738
00444e8c950c199b637e5dcdfe401e0c 0.0 0.0 226.0 0.0 0.271410 1.000000 82.352228 20.741905
0055eb3281238fa3388a6db46d7d2d01 0.0 0.0 163.0 0.0 0.321769 1.000000 82.352228 24.602803
006e783513f582e9657d18334d06df49 1.0 103.0 634.0 53.0 0.277443 0.487711 63.353343 16.298747

In addition to the probability of being alive (proba_alive) and the expected number of purchases (predicted_purchases), we now have the expected value for each purchase (exp_avg_value) and as a result, the estimated remaining customer lifetime value (clv), which is the the product of the probability of being alive and the expected average purchase value, diminished by the discounted cash flow rate (discount_rate).