BTYD Customer Lifetime Value

Introduction

Customer Lifetime Value (CLV) is a metric to estimate the value (profit) that a relationship with a customer represents for a business during a specific period of time. Some of the benefits of CLV are Better Marketing, Encourage Brand Loyalty, and Gain More Sales.

The goal of this project is to estimate the Customer Lifetime Value (CLV) that can enable the organization to implement future actions to maximize it.

For this implementation, we will use a Buy Till You Die (BTYD) model given by a R package with the same name: BTYD.

not_null <- function (v) {
  if (!is.null(v)) return(paste(v, "not null"))
}
data(iris)
tab <- iris %>%
  group_by(Species) %>%
  summarise(Sepal.Length = mean(Sepal.Length),
            Sepal.Width  = mean(Sepal.Width),
            Petal.Length = mean(Petal.Length),
            Petal.Width  = mean(Petal.Length))

Roadmap

Here are the stages to be followed by this project.

# Here is the structure of the `iris` dataset.

# str(iris)

Business Understanding

Global Superstore is a fictitious retailer company doing business in several countries having presence on every continent. The dataset from this company was created and made available by Tableau. The idea behind this is to analyze the transactional data in order to learn spending habits related to sales by regions, segments, product categories, etc. Additionally, segmentation and forecasting models can be performed among other kind of analysis.

Global Superstore records commercial transactions information in detail, in particular ORDERS data, Thus, we will be able to examine the value of the customers over time to gain insights that can benefit the organization allowing make more data-driven decisions.

Data Acquisition

We will use The Global Superstore dataset that contains information about products, sales, profits, and so on; data that can generate rich information to identify actionable insights or key areas for improvement within this company.

The dataset has information of around 50k orders from 2014 to 2017 made at multiple markets worldwide.

Click here to access the dataset.

Loading order datasets….

Here is a sample of the order datasets.

# Reading the dataset

data.order <- data.frame(read_excel("Global Superstore 2018.xlsx", sheet="Orders"))

data.order$Order.Year <- substr(data.order$Order.Date, 1,4) 

datatable(data.order[(1:20),], filter = 'top', options = list(
  pageLength = 5, scrollX = TRUE, scrollY = "200px", autoWidth = TRUE), caption = 'Table 1: Orders dataset.')

Checking missing values…

prepare_missing_values_graph(data.order, ts_id = "Order.Year") + scale_fill_gradient(low="black", high="darkred")
Missing Values Review

Missing Values Review

We can notice that only Postal Code has missing values. So no big issues with the data since we need mainly the columns: order date, customer id, and sales. These columns will be the pillars for our CLV model.

Data Preparation

As next step, we will group the data (order lines) and put the transaction date in a correct format. Then, we will present a complete transaction records of 30 customers.

[1] "2014-01-01" "2017-12-31"

Data Modeling

We will implement a BG/NBD model that is used for non-contractual situations in which customers can make purchases at any time.

Click here to access the research paper.

print('Checking a sample of the final data structure.')
[1] "Checking a sample of the final data structure."
elog$date <- as.Date(elog$date, "%Y%m%d");

elog[1:3,]
# A tibble: 3 x 3
  cust       date       sales
  <chr>      <date>     <dbl>
1 PO-8865138 2014-10-02 161. 
2 EB-4110138 2016-10-14 229. 
3 MY-7380138 2016-12-31  23.6

Merging all transactions that occurred on the same day. Transaction-flow models, such as the BG/NBD, is concerned with interpurchase time.

Dividing the data up into a calibration period and a holdout period. We will use December 31, 2015 as the cutoff date, so 104 weeks are dividing the dataset in half.

end.of.cal.period <- as.Date("2015-12-31")
elog.cal <- elog[which(elog$date <= end.of.cal.period), ]

split.data <- dc.SplitUpElogForRepeatTrans(elog.cal);
clean.elog <- split.data$repeat.trans.elog;

The next step is to create a customer-by-time matrix. This is simply a matrix with a row for each customer and a column for each date.

              date
cust           2014-01-27 2014-02-18 2014-03-07 2014-03-22 2014-04-01
  DB-135551408          0          0          0          0          0
  DL-133151402          0          1          0          0          0
  Dl-136001404          0          0          0          0          0
  DN-136901406          0          0          0          0          0
  DP-130001402          0          0          0          0          0
              date
cust           2014-04-02
  DB-135551408          0
  DL-133151402          0
  Dl-136001404          0
  DN-136901406          0
  DP-130001402          0
[1] 1.606620e+00 3.955141e+02 4.579787e-03 5.676831e+02
[1] -10586.29
         r    alpha           a        b        LL
1 1.606620 395.5141 0.004579787 567.6831 -10586.29
2 1.606584 395.5266 0.004579787 567.6832 -10586.29
3 1.606617 395.5282 0.004579786 567.6833 -10586.29

              [,1]         [,2]         [,3]         [,4]         [,5]
                      [,6]         [,7]         [,8]         [,9]
                     [,10]        [,11]        [,12]        [,13]
                     [,14]        [,15]        [,16]        [,17]
                     [,18]        [,19]        [,20]        [,21]
                     [,22]        [,23]        [,24]        [,25]
                     [,26]        [,27]        [,28]        [,29]
                     [,30]        [,31]        [,32]        [,33]
                     [,34]        [,35]        [,36]        [,37]
                     [,38]       [,39]        [,40]        [,41]
                     [,42]        [,43]        [,44]       [,45]
                     [,46]        [,47]        [,48]        [,49]
                     [,50]        [,51]        [,52]        [,53]
                     [,54]        [,55]        [,56]        [,57]
                    [,58]        [,59]        [,60]        [,61]
                     [,62]        [,63]       [,64]        [,65]
                     [,66]        [,67]       [,68]       [,69]
                    [,70]       [,71]      [,72]       [,73]       [,74]
                    [,75]       [,76]       [,77]       [,78]       [,79]
                    [,80]      [,81]      [,82]      [,83]     [,84]
                   [,85]      [,86]      [,87]      [,88]      [,89]
                   [,90]      [,91]      [,92]      [,93]      [,94]
                   [,95]     [,96]      [,97]      [,98]      [,99]
                  [,100]
 [ reached getOption("max.print") -- omitted 2 rows ]

              [,1]         [,2]        [,3]         [,4]         [,5]
                      [,6]         [,7]         [,8]         [,9]
                     [,10]        [,11]        [,12]        [,13]
                     [,14]        [,15]        [,16]        [,17]
                     [,18]        [,19]        [,20]        [,21]
                     [,22]        [,23]        [,24]        [,25]
                     [,26]        [,27]        [,28]        [,29]
                     [,30]        [,31]        [,32]        [,33]
                     [,34]        [,35]        [,36]        [,37]
                     [,38]        [,39]        [,40]        [,41]
                     [,42]        [,43]        [,44]        [,45]
                     [,46]        [,47]        [,48]        [,49]
                     [,50]        [,51]        [,52]        [,53]
                     [,54]        [,55]        [,56]        [,57]
                    [,58]        [,59]        [,60]        [,61]
                     [,62]        [,63]        [,64]        [,65]
                     [,66]        [,67]        [,68]        [,69]
                     [,70]        [,71]        [,72]        [,73]
                     [,74]        [,75]        [,76]        [,77]
                    [,78]        [,79]        [,80]        [,81]
                     [,82]        [,83]        [,84]        [,85]
                     [,86]        [,87]        [,88]        [,89]
                     [,90]        [,91]        [,92]        [,93]
                     [,94]        [,95]        [,96]        [,97]
                     [,98]        [,99]       [,100]
 [ reached getOption("max.print") -- omitted 2 rows ]

Evaluation

Making estimations for customers on the individual level.

[1] "Customer: AA-10315102"
[1] 0.2112213
    x   t.x T.cal 
    0     0    78 
[1] 0.1764288
[1] 1
x: 10    Expectation: 1.388937
x: 15    Expectation: 1.987265
x: 20    Expectation: 2.585584
x: 25    Expectation: 3.183893

               freq.0   freq.1   freq.2   freq.3   freq.4  freq.5+
n.x.actual   7010.000 1120.000 190.0000 35.00000 4.000000 2.000000
n.x.expected 6998.128 1106.972 194.2761 35.46713 6.610739 1.249961
           x t.x     T.cal x.star
PO-8865138 0   0  65.00000      0
AA-6453    0   0  88.42857      0
AC-4203    0   0 103.28571      1

          freq.0       freq.1      freq.2     freq.3   freq.4  freq.5+
act    0.4888730    0.7812500   0.9631579  0.7142857 2.000000 1.000000
exp    0.3807435    0.5951219   0.8043287  1.0232692 1.219812 1.478683
bin 7010.0000000 1120.0000000 190.0000000 35.0000000 4.000000 2.000000

             [,1]     [,2]     [,3]     [,4]    [,5]     [,6]
actual   0.000000 6.000000 0.000000 6.000000 9.00000 4.000000
expected 4.467566 4.712444 5.005485 5.466226 5.84979 6.256565

             [,1]     [,2]    [,3]     [,4]     [,5]     [,6]
actual   22.00000 28.00000 28.0000 34.00000 43.00000 47.00000
expected 42.96617 47.67861 52.6841 58.15032 64.00011 70.25668

Saul Ventura

Last update: 21 February, 2019