-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estimation failed with NA coefs error #112
Comments
Hi Herman, without knowing too much about your data and your use case, this likely happened because the optimization breaks. The standard optimization method for pnbd without covariates is L-BFGS-B which breaks if NaN or Inf values are returned from the pnbd LogLikelihood function during model fitting. You could therefore try another optimization method by specifying it in the
See Alternatively, you could try to use a different period definition such as weeks instead of days which may yield numerically more stable results. Does this help? |
Hi, Thank you for responding. Attached is the data file sample_trx.csv Below is the code I ran.
All other optim algo failed, only Rcgmin passed. Summary of the estimate is as follows
plot is as follows: Why is the estimate so bad? Herman |
Hi Herman, thanks for your message. In your output, both, KKT1 & KKT2 are FALSE. This indicates that the model did not properly fit the data. Looking at your data, I wonder if you are applying the data to a customer cohort or a random sample of the customer base. I realize that the documentation at https://www.clvtools.com/articles/CLVTools.html does not provide currently to many details on this. Please be aware that the standard approach for any probablistic model is to apply these models to a specific customer cohort, i.e. customers that have been acquired in the same week/month/quarter. In the future, we will add more details on this to the walkthrough. An important question which you have to look into is, how many new customer you acquire per week/month/quarter/etc. Obviously, this number likely varies as your firm growths. Ensure that you have at least >500 new customers in each cohort, more is better. If you short target different segments (B2C / B2B) it makes sense to model these customers separately. On a related note, in your case it seems like it would make sense to use "weeks" instead of "days" when creating the data object. Note that changing the argument to time.unit=“weeks” does only affect the scale. It does not change or aggregate your data (see walkthrough document for further details). If you follow these advices, your results should look more like the results provided here: https://www.clvtools.com/articles/CLVTools.html Best, |
This comment has been minimized.
This comment has been minimized.
Although the comment was marked as resolved, I leave this answer as a note for future users that come across this issue. "The other algos" here refer to the optimization method used to minimize the LogLikelihood (LL) function. The LL for the pnbd model contains the hypergeometric function 2F1 which is notoriously difficult to compute. If the hypergeom cannot be calculated on your data for some parameter combination, the LL will return NA or Inf values. Unfortunately, not all optimization methods can handle NA/Inf being returned from the target function and will stop immediately, notably method BFGS. You can therefore not expect all optimization methods available in optimx to work with your data. You can use Nelder-Mead can deal with non-finite returns and would therefore be recommended in your case. We now plan to set Nelder-Mead as the default for all methods, see #119. As already pointed out previously by Markus and me, it would also be advisable to use weekly time.units instead of daily. Unless your customers often make multiple purchases per week, daily time.units are likely unnecessary and will result in large, unstable parameter estimates. When using weekly (or yearly) time units, the data is internally still represented using the Using your single cohort sample data, the tracking plot with weekly time units also looks much better:
(note also KKTs are true) On a side note, "to a customer cohort of Jan 2017" does not seem to hold for your data: |
Thanks for the input |
Hi,
Thank you writing this package.
I tried with my data and it returns "Estimation failed with NA coefs":
summary(clv_tbs)
:What can I do to get the correct results?
Herman
The text was updated successfully, but these errors were encountered: