-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hessian could not be derived. Setting all entries to NA. #132
Comments
Hi thanks for your feedback. This issue is likely related to the optimization method that is used to maximize the likelihood function. Hence, for a more stable optimization you likely want to use method
See See also this previous issue about the coefficients being NA where some more details are provided: #112. The error therefore is not really about the hessian that could not be derived but in general about the estimation failing. In the next release, it will no longer be possible to
Note that it requires the GSL library in order to compile successfully from source. Further, I would in general recommend to also investigate the model fit with Another minor thing I just saw in your code:
can be done with
Patrik |
Hi Patrik,
Thanks for the detailed and clear explanation – it makes a lot of sense!
I will do as you recommend, and I really look forward to following your work on this exciting package!
All the best
Christian
…---
Hi
thanks for your feedback.
This issue is likely related to the optimization method that is used to maximize the likelihood function.
By default CLVTools currently uses method L-BFGS-B which will break, immediately stop, and return NA coefficients if non-finite values are returned from the likelihood function. Unfortunately, the log-likelihood function for pnbd requires the notorious hypergeometric 2F1 function which depending on the data and parameters will yield NaNs or Infs.
Hence, for a more stable optimization you likely want to use method Nelder-Mead instead which will not terminate if non-finite values are returned:
est <- pnbd(clv.data = cht.clv, optimx.args = list(method="Nelder-Mead"))
# or for more insights about the optimization
est <- pnbd(clv.data = cht.clv, optimx.args = list(method="Nelder-Mead", trace = 6))
See ?optimx for what other options are available and see also other examples in ?pnbd.
See also this previous issue about the coefficients being NA where some more details are provided: #112<#112>.
The error therefore is not really about the hessian that could not be derived but in general about the estimation failing. In the next release, it will no longer be possible to predict() and plot() if there are NA in the coefficients because it it is obviously rather confusing.
Feel free to already work with the latest version currently on the development branch as it also contains other (exciting!) new features such as fitting the Gamma-Gamma (gg()) spending model separately:
devtools::install_github(repo = "bachmannpatrick/CLVTools", ref = "development")
Note that it requires the GSL library in order to compile successfully from source.
Further, I would in general recommend to also investigate the model fit with summary(est) and with plot(est) before proceeding with predicting.
Another minor thing I just saw in your code:
cht.clv <- cht[, list(Id = cust, Date = date, Price = sales)]
cht.clv <- clvdata(cht.clv, date.format = "ymd", time.unit = "w", estimation.split = 145)
can be done with
cht.clv <- clvdata(cht.clv, date.format = "ymd", time.unit = "w", estimation.split = 145, name.id = "cust", name.date="date", name.price = "sales")
Patrik
|
Hi Patrik, Here is some more feedback. I first ran the following code:
Result:
Then this code: Result:
What is puzzling is that it was possible to estimate p alive and both the average spending and CET, but not CLV which should be simply a multiplication of spending and CET, right? Thanks |
Thanks for sharing. The fit is not too bad for the standard Pareto/NBD model. Looks like you could really benefit from controlling for seasonality ;-) Be aware that the estimation with time-varying covariates will take considerable more time than the estimation without covariates. Executing the code in the walkthrough (https://www.clvtools.com/articles/CLVTools.html) gives you a good overview on this. Are you able to share about which industry we are talking here? |
Hi Patrik, If you send me an email to [email protected], then I will share some info about the dataset. Christian |
I also suspect that there are in fact different customer segments involved, but it isn't clear from the data. Thanks for the general tips! |
Thank you for this wonderful new package, which includes the much needed possibility of accounting for co-variates.
I have tried the function pnbd(), but it results in NA predictions. The warning message is:
I have used a real transaction dataset. From this, I filtered the data to include only a cohort defined as all customers starting during the first two month of a year.
I have compared with BTYDPlus using the same dataset and experienced no problems.
Code:
The text was updated successfully, but these errors were encountered: