You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
d in the pnbd dyncov model is the number of periods until the end of the covariate interval containing a transaction ('ceiling_date' on the time unit). Its definition remains an ongoing issue because the math builds on continuous time what leaves it open how transactions on the covariate interval boundary should be treated.
Theory assumes continuous time what makes it a matter of definition whether a time point that lies on the boundary of a covariate interval counts to the first or the next covariate interval. Hence, d can either be 0 <= d < 1 or 0 < d <= 1. What we overlooked for a long time, is that the correct definition of d depends on whether a transaction on the interval boundary counts to the interval or not. This should conclusively determine the definition of d.
Due to the discrete temporal resolution (there is only finite precision on computers), all covariate intervals are closed. A transaction can hence only count towards one covariate interval but never to two covariate intervals at the same time. A transaction occurring on what is defined as the lower or upper boundary of this closed interval counts towards the covariate interval it is in. In other words, interval boundaries never "overlap" and a transaction always belongs to a single interval.
Practical relevance
Running on the example data provided in the package and code on the man page of SetDynamicCovariates with change_on_boundary=True/False:
Considerably different coefficients estimated, but mostly for the dropout covariates (and much more for negative values)
Nearly double the runtime and not both KKT conditions fulfilled for change_on_boundary=False hints at conversion issue
Slightly different final LL values
Results are reproducible, hence parameter variations are not due to model identification issues
Results
change_on_boundary=True:
r alpha s beta life.Marketing life.Gender life.Channel trans.Marketing trans.Gender trans.Channel
1.65 40.05 0.36 5.58 -0.16 0.76 -1.76 0.45 1.26 0.34
change_on_boundary=False:
r alpha s beta life.Marketing life.Gender life.Channel trans.Marketing trans.Gender trans.Channel
1.62 34.66 0.40 7.46 -0.93 0.71 -1.28 0.43 1.13 0.33
change_on_boundary=True:
value fevals gevals niter convcode kkt1 kkt2 xtime
4926.152 2203 NA NA 0 TRUE TRUE 42.75
change_on_boundary=False:
value fevals gevals niter convcode kkt1 kkt2 xtime
4931.944 2737 NA NA 0 FALSE TRUE 82.653
Example
Using years rather than weeks because start/end of weeks is ill-defined and itself often a source of confusion.
Given the covariate intervals for 1999 and 2000 which span [01-01-1999, 31-12-1999][01-01-2000, 31-12-2000], what is d associated with a transaction on
01-01-1999 (lower boundary): either d=1 or d=0
31-12-1999 (upper boundary): d=1/365, always
Resolving this issue
What we know that could help nail this down
all intervals are closed
transactions on the upper or lower boundary are both always counted towards the interval they are in
as we move backwards from 31-12-1999, d increases and then either jumps to 0 at 01-01-1999 or "continuously" increases to 1
a parameter recovery study (better recovery, better sum LL value) could help
What should be noted is that the question revolves around the interval lower boundary: Because the upper boundary does not overlap with the lower boundary of the next interval, the difference to it is always >0 (31-12-1999 in the example). The question is whether for the lower boundary d should be d=0 or d=1.
Appendix: change_on_boundary
Adding to the confusion is that the time unit boundary in lubridate, including in ceiling_date(), refers to what is the lower boundary of the covariate interval. This is because the intervals are defined to start from and include lubridate's time unit boundary (01-01-1999 in the example).
The text was updated successfully, but these errors were encountered:
Background
d
in the pnbd dyncov model is the number of periods until the end of the covariate interval containing a transaction ('ceiling_date' on the time unit). Its definition remains an ongoing issue because the math builds on continuous time what leaves it open how transactions on the covariate interval boundary should be treated.Theory assumes continuous time what makes it a matter of definition whether a time point that lies on the boundary of a covariate interval counts to the first or the next covariate interval. Hence,
d
can either be0 <= d < 1
or0 < d <= 1
. What we overlooked for a long time, is that the correct definition ofd
depends on whether a transaction on the interval boundary counts to the interval or not. This should conclusively determine the definition ofd
.Due to the discrete temporal resolution (there is only finite precision on computers), all covariate intervals are closed. A transaction can hence only count towards one covariate interval but never to two covariate intervals at the same time. A transaction occurring on what is defined as the lower or upper boundary of this closed interval counts towards the covariate interval it is in. In other words, interval boundaries never "overlap" and a transaction always belongs to a single interval.
Practical relevance
Running on the example data provided in the package and code on the man page of
SetDynamicCovariates
withchange_on_boundary=True/False
:change_on_boundary=False
hints at conversion issueResults
Example
Using years rather than weeks because start/end of weeks is ill-defined and itself often a source of confusion.
Given the covariate intervals for 1999 and 2000 which span
[01-01-1999, 31-12-1999][01-01-2000, 31-12-2000]
, what is d associated with a transaction onResolving this issue
What we know that could help nail this down
What should be noted is that the question revolves around the interval lower boundary: Because the upper boundary does not overlap with the lower boundary of the next interval, the difference to it is always >0 (31-12-1999 in the example). The question is whether for the lower boundary d should be d=0 or d=1.
Appendix: change_on_boundary
Adding to the confusion is that the time unit boundary in lubridate, including in
ceiling_date()
, refers to what is the lower boundary of the covariate interval. This is because the intervals are defined to start from and include lubridate's time unit boundary (01-01-1999 in the example).The text was updated successfully, but these errors were encountered: