`stcrprep` - using `stpm3’ to model cause-specific CIFs

The ideas of Geskus (2011) to expand the data and then the Cox modelling framework to fit a subhazard model can be also applied to parametric models. When fitting a parametric model for the CIF using weighted maximum likelihood, the censoring distribution is a continuous function of time, so rather than using the Kaplan-Meier estimate a (flexible) parametric model is used to obtain the weights.

The likelihood involves a non-tractible integral and so an approximation is used by splitting the time scale into a number of intervals. See out paper on this where we show that these intervals can be fairly wide, which is useful in large datasets (Lambert et al. 2016).

With this approach then, after restructuring the data and calculating the weights, we can use standard parametric survival models to estimate the cause-specific CIF. I now use stpm3 to fit a flexible parametric survival model.

First I load and stset the data.

. use http://www.pclambert.net/data/ebmt1_stata.dta, clear
(Written by R.              )

. stset time, failure(status==1,2) scale(365.25) id(patid)

Survival-time data settings

           ID variable: patid
         Failure event: status==1 2
Observed time interval: (time[_n-1], time]
     Exit on or before: failure
     Time for analysis: time/365.25

--------------------------------------------------------------------------
      1,977  total observations
          0  exclusions
--------------------------------------------------------------------------
      1,977  observations remaining, representing
      1,977  subjects
      1,141  failures in single-failure-per-subject data
  3,796.057  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =  8.454483

I use stcrprep as I wantted to fit a Cox model, but now I ask that the time-dependent weights are calculated using stpm2 with 4 d.f. to model the baseline. Note that stcrprep was written before I released stpm3, which is why the options is wtstpm2. The every(0.25) option requests that the time scale is split every 0.25 years. This means that the weights are updated every quarter of a year in the expanded dataset.

. stcrprep, events(status) keep(score) trans(1 2) wtstpm2 censdf(4) every(0.25)

. generate event = status == failcode 

. stset tstop [iw=weight_c], failure(event) enter(tstart) noshow

Survival-time data settings

         Failure event: event!=0 & event<.
Observed time interval: (0, tstop]
     Enter on or after: time tstart
     Exit on or before: failure
                Weight: [iweight=weight_c]

--------------------------------------------------------------------------
     39,227  total observations
          0  exclusions
--------------------------------------------------------------------------
     39,227  observations remaining, representing
      1,141  failures in single-record/single-failure data
 16,367.154  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =  8.454483

. stpm3 i.score if failcode == 1, scale(lncumhazard) df(4) eform nolog

                                                        Number of obs = 23,673
                                                        Wald chi2(2)  =   9.89
Log likelihood = -1678.9025                             Prob > chi2   = 0.0071

------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
       score |
Medium risk  |   1.270361   .1592233     1.91   0.056     .9936652    1.624105
  High risk  |   1.769961   .3219309     3.14   0.002     1.239203    2.528048
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -23.89114   3.783607    -6.31   0.000    -31.30687    -16.4754
        _ns2 |    5.77445   2.002546     2.88   0.004     1.849531    9.699369
        _ns3 |   -.871365   .0644396   -13.52   0.000    -.9976643   -.7450658
        _ns4 |  -.5940183    .089165    -6.66   0.000    -.7687786   -.4192581
       _cons |  -1.312316   .1156177   -11.35   0.000    -1.538923    -1.08571
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

We can the predict command to obtain estimates of the CIFs.

. predict CIF1 CIF2 CIF3, failure timevar(0 8, step(0.1)) ci            ///
>                         frame(CIFs, replace)                          ///
>                         at1(score 1) at2(score 2) at3(score 3)        ///
>                         contrast(difference) contrastvar(diff2 diff3)
Predictions are stored in frame - CIFs

I have predicted the CIFs for each of the 3 risk groups together with 95% confidence intervals. In addition, I have calculated the difference in CIFs (with score group 1 as the reference).

I can plot the baseline CIF with 95% CI.

. frame CIFs {
.   twoway (rarea CIF1_lci CIF1_uci tt, color(%30)) ///
>          (line CIF1 tt, pstyle(p1line)),          ///
>          xtitle(Years since transplantation)      ///
>          ytitle(CIF)                              ///
>          ylabel(,format(%3.1f))                   ///
>          legend(off)
. }

I can plot all three predicted CIFs.

. frame CIFs {
.   twoway (line CIF1 CIF2 CIF3 tt),            ///
>          xtitle(Years since transplantation)  ///
>          ytitle(CIF)                          ///
>          ylabel(,format(%3.1f))               ///
>          legend(order(1 "Low Risk"            ///
>                       2 "Medium Risk"         ///
>                       3 "High Risk") pos(5))
. }

I can plot the difference in CIFs between the high and low risk group.

. frame CIFs {
.   twoway (rarea diff3_lci diff3_uci tt, color(%30)) ///
>          (line diff3 tt, pstyle(p1line)),           ///
>          xtitle(Years since transplantation)        ///
>          ytitle(Difference in CIF)                  ///
>          ylabel(,format(%3.1f))                     ///
>          legend(off)
. }

References

Geskus, R. B. Cause-specific cumulative incidence estimation and the Fine and Gray model under both left truncation and right censoring. Biometrics 2011; 67:39–49.

Lambert, P.C., S.R. Wilkes, and M.J. Crowther. Flexible parametric modelling of the cause-specific cumulative incidence function. Statistics in Medicine 2016;36:1429-1446.