Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating model performance - missing? #210

Closed
SSMK-wq opened this issue Nov 8, 2022 · 3 comments
Closed

Evaluating model performance - missing? #210

SSMK-wq opened this issue Nov 8, 2022 · 3 comments
Labels
use case General application question related to an user's use case.

Comments

@SSMK-wq
Copy link

SSMK-wq commented Nov 8, 2022

Thanks for this wonderful package and awesome tutorial.

My question is on assessing the model performance. Based on the screenshot from walkthrough page, I have two questions and they are

image

a) Isn't there no train and test split for building and assessing the models? then is training using full dataset recommended or right thing to do? If yes, can shed some insights on how training using full dataset is useful for models like this (and how they are different from traditional ML models like random forest etc)

b) I also see that model overpredicts the spending of a customer who has ZERO as actual value. And this behavior is same for all other customers with Zero as actual spending. Does model take population mean or something for prediction? So, how do we assess the model performance?

Should we compute traditional "R2", "RMSE" etc?

Is there any inbuilt approach within CLVTools package to assess the model performance

@bachmannpatrick
Copy link
Owner

Hi!

a) There is an estimation period and a holdout period (if specified). The estimation split is defined in clvdata() using the estimation.split argument. If estimation.split=NULL there is no holdout period.

b) A customer having 0 spending during the holdout period, does not mean that this customer has not purchased during the estimation period. There was just no transaction (or only transactions with Price=0) during the holdout period. The population mean is only used if a customer is not observed at all (no repeated transactions during estimation period).

We do not include an approach in CLVTools to assess model performance. I would suggest using the accuracy() command of the package forecast.

Best,
Patrick

@SSMK-wq
Copy link
Author

SSMK-wq commented Nov 8, 2022

@bachmannpatrick - quick follow up questions

a) I tried using estimation split but guess CLVTools puts a restriction on selecting a cohort of users (from same startimg point like same month, same quarter etc). Since our dataset is 5 years long and not majority of customers start at the same time, my code doesn't work because of lack of sufficient points under estimation etc. Is there anyway to get around this? If I set estimation period = Null, then there will be no testing set. So, in probabilistic modelling, is it okay to not treat them as regular ml modelling with train and test

b) In the above screenshot, for id = 1 and 10, let's assume he was part of estimation set with 0 repeat purchase records. So, value predicted for him will be population mean?

@mmeierer
Copy link
Collaborator

mmeierer commented Nov 8, 2022

Most approaches for modeling CLV (no matter, if probabilistic or not) start from the assumption that you split up your data in cohorts (and thereby, look at customers' tenure as the relevant time reference and not calendar time). For more information, see here: https://github.com/bachmannpatrick/CLVTools/issues/172#issuecomment-862151684 . Meaning, you will estimate a separate model for each cohort.

For every, cohort dataset you can choose an individual train/test period. Depending on your data, we often recommend to use at least 1 year for training.

The toy dataset that comes with CLVTools are the purchase records for a single cohort of a apparel retailer (https://rdrr.io/cran/CLVTools/man/apparelTrans.html).

As far as I understand, following the standard workflow for modeling CLV should solve the issues that you describe above.

@mmeierer mmeierer added the use case General application question related to an user's use case. label Nov 8, 2022
@mmeierer mmeierer closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case General application question related to an user's use case.
Projects
None yet
Development

No branches or pull requests

3 participants