stcrprep
- using `stpm3’ to model cause-specific CIFs
The ideas of Geskus (2011) to expand the data and then the Cox modelling framework to fit a subhazard model can be also applied to parametric models. When fitting a parametric model for the CIF using weighted maximum likelihood, the censoring distribution is a continuous function of time, so rather than using the Kaplan-Meier estimate a (flexible) parametric model is used to obtain the weights.
The likelihood involves a non-tractible integral and so an approximation is used by splitting the time scale into a number of intervals. See out paper on this where we show that these intervals can be fairly wide, which is useful in large datasets (Lambert et al. 2016).
With this approach then, after restructuring the data and calculating the weights, we can use standard parametric survival models to estimate the cause-specific CIF. I now use stpm3
to fit a flexible parametric survival model.
First I load and stset
the data.
use http://www.pclambert.net/data/ebmt1_stata.dta, clear
. by R. )
(Written
stset time, failure(status==1,2) scale(365.25) id(patid)
.
data settings
Survival-time
variable: patid
ID
Failure event: status==1 2_n-1], time]
Observed time interval: (time[on or before: failure
Exit for analysis: time/365.25
Time
--------------------------------------------------------------------------total observations
1,977
0 exclusions
--------------------------------------------------------------------------
1,977 observations remaining, representing
1,977 subjectsin single-failure-per-subject data
1,141 failures total analysis time at risk and under observation
3,796.057
At risk from t = 0
Earliest observed entry t = 0exit t = 8.454483 Last observed
I use stcrprep
as I wantted to fit a Cox model, but now I ask that the time-dependent weights are calculated using stpm2
with 4 d.f. to model the baseline. Note that stcrprep
was written before I released stpm3
, which is why the options is wtstpm2
. The every(0.25)
option requests that the time scale is split every 0.25 years. This means that the weights are updated every quarter of a year in the expanded dataset.
keep(score) trans(1 2) wtstpm2 censdf(4) every(0.25) . stcrprep, events(status)
generate event = status == failcode
.
stset tstop [iw=weight_c], failure(event) enter(tstart) noshow
.
data settings
Survival-time
Failure event: event!=0 & event<.
Observed time interval: (0, tstop]on or after: time tstart
Enter on or before: failure
Exit iweight=weight_c]
Weight: [
--------------------------------------------------------------------------total observations
39,227
0 exclusions
--------------------------------------------------------------------------
39,227 observations remaining, representingin single-record/single-failure data
1,141 failures total analysis time at risk and under observation
16,367.154
At risk from t = 0
Earliest observed entry t = 0exit t = 8.454483
Last observed
score if failcode == 1, scale(lncumhazard) df(4) eform nolog
. stpm3 i.
of obs = 23,673
Number chi2(2) = 9.89
Wald chi2 = 0.0071
Log likelihood = -1678.9025 Prob >
------------------------------------------------------------------------------exp(b) Std. err. z P>|z| [95% conf. interval]
|
-------------+----------------------------------------------------------------xb |
score |
Medium risk | 1.270361 .1592233 1.91 0.056 .9936652 1.624105
High risk | 1.769961 .3219309 3.14 0.002 1.239203 2.528048
-------------+----------------------------------------------------------------
time |
_ns1 | -23.89114 3.783607 -6.31 0.000 -31.30687 -16.4754
_ns2 | 5.77445 2.002546 2.88 0.004 1.849531 9.699369
_ns3 | -.871365 .0644396 -13.52 0.000 -.9976643 -.7450658
_ns4 | -.5940183 .089165 -6.66 0.000 -.7687786 -.4192581_cons | -1.312316 .1156177 -11.35 0.000 -1.538923 -1.08571
------------------------------------------------------------------------------in the first equation. Note: Estimates are transformed only
We can the predict command to obtain estimates of the CIFs.
predict CIF1 CIF2 CIF3, failure timevar(0 8, step(0.1)) ci ///
. replace) ///
> frame(CIFs, score 1) at2(score 2) at3(score 3) ///
> at1(
> contrast(difference) contrastvar(diff2 diff3)in frame - CIFs Predictions are stored
I have predicted the CIFs for each of the 3 risk groups together with 95% confidence intervals. In addition, I have calculated the difference in CIFs (with score group 1 as the reference).
I can plot the baseline CIF with 95% CI.
. frame CIFs {twoway (rarea CIF1_lci CIF1_uci tt, color(%30)) ///
. line CIF1 tt, pstyle(p1line)), ///
> (xtitle(Years since transplantation) ///
> ytitle(CIF) ///
> ylabel(,format(%3.1f)) ///
> legend(off)
> . }
I can plot all three predicted CIFs.
. frame CIFs {twoway (line CIF1 CIF2 CIF3 tt), ///
. xtitle(Years since transplantation) ///
> ytitle(CIF) ///
> ylabel(,format(%3.1f)) ///
> legend(order(1 "Low Risk" ///
> "Medium Risk" ///
> 2 "High Risk") pos(5))
> 3 . }
I can plot the difference in CIFs between the high and low risk group.
. frame CIFs {twoway (rarea diff3_lci diff3_uci tt, color(%30)) ///
. line diff3 tt, pstyle(p1line)), ///
> (xtitle(Years since transplantation) ///
> ytitle(Difference in CIF) ///
> ylabel(,format(%3.1f)) ///
> legend(off)
> . }
References
Geskus, R. B. Cause-specific cumulative incidence estimation and the Fine and Gray model under both left truncation and right censoring. Biometrics 2011; 67:39–49.
Lambert, P.C., S.R. Wilkes, and M.J. Crowther. Flexible parametric modelling of the cause-specific cumulative incidence function. Statistics in Medicine 2016;36:1429-1446.