Predictions

stpm3 makes use of frames for predictions

Predictions are much improved in stpm3.

We first load the Rotterdam 2 breast cancer data and then use stset to declare the survival time and event indicator.

. use https://www.pclambert.net/data/rott2b, clear
(Rotterdam breast cancer data (augmented with cause of death))

. stset os, f(osi==1) scale(12) exit(time 120)

Survival-time data settings

         Failure event: osi==1
Observed time interval: (0, os]
     Exit on or before: time 120
     Time for analysis: time/12

--------------------------------------------------------------------------
      2,982  total observations
          0  exclusions
--------------------------------------------------------------------------
      2,982  observations remaining, representing
      1,171  failures in single-record/single-failure data
 20,002.424  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

I will fit a simple proportional hazards model just include hormon and age.

. stpm3 i.hormon age, scale(lncumhazard) df(5) 

Iteration 0:   log likelihood = -2909.4524  
Iteration 1:   log likelihood = -2908.4666  
Iteration 2:   log likelihood = -2908.4635  
Iteration 3:   log likelihood = -2908.4635  

                                                        Number of obs =  2,982
                                                        Wald chi2(2)  =  64.07
Log likelihood = -2908.4635                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
      hormon |
        yes  |   .3229905   .0876157     3.69   0.000     .1512669    .4947141
         age |   .0149851    .002374     6.31   0.000     .0103322    .0196381
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -24.01501   1.921081   -12.50   0.000    -27.78026   -20.24976
        _ns2 |   6.695913   1.028047     6.51   0.000     4.680978    8.710849
        _ns3 |  -1.221354   .0499264   -24.46   0.000    -1.319208   -1.123501
        _ns4 |  -.8156036   .0389592   -20.93   0.000    -.8919622   -.7392449
        _ns5 |  -.5046344   .0422296   -11.95   0.000     -.587403   -.4218658
       _cons |  -1.391522   .1368464   -10.17   0.000    -1.659736   -1.123308
------------------------------------------------------------------------------
Warning: This is a test version of stpm3

We can predict the survival function for a 60 year old women who did not take hormon therapy using,

. predict S, survival ci                 ///
>            at1(age 60 hormon 0)        ///
>            timevar(0 10, step(0.1)) 
Predictions are stored in frame - stpm3_pred

The predict command has requested a survival function with a 95% confidence interval. The values of the covariates to predict at are given in the at1() option. It is possible to have multiple at options. The predictions will be at times between 0 and 10 in steps of 0.1, leading to 101 observations.

We can list or plot the results in frame stpm3_pred.

. frame stpm3_pred: list in 1/21

     +-----------------------------------------+
     |  tt           S       S_lci       S_uci |
     |-----------------------------------------|
  1. |   0           1           1           1 |
  2. |  .1   .99987272   .99948675   .99996844 |
  3. |  .2   .99939144   .99844378   .99976209 |
  4. |  .3   .99849308    .9969857   .99924694 |
  5. |  .4   .99715658   .99513109   .99834016 |
     |-----------------------------------------|
  6. |  .5   .99537771   .99288014   .99700048 |
  7. |  .6   .99316099   .99022634    .9952166 |
  8. |  .7   .99051598   .98716155   .99299707 |
  9. |  .8   .98745529   .98367965   .99036174 |
 10. |  .9   .98399346   .97977945   .98733494 |
     |-----------------------------------------|
 11. |   1   .98014617   .97546655   .98394053 |
 12. | 1.1   .97592979    .9707539   .98019903 |
 13. | 1.2   .97136109   .96566103    .9761267 |
 14. | 1.3   .96645696   .96021236   .97173595 |
 15. | 1.4   .96123425   .95443524   .96703632 |
     |-----------------------------------------|
 16. | 1.5   .95570969   .94835805   .96203579 |
 17. | 1.6   .94989974    .9420086   .95674187 |
 18. | 1.7   .94382061   .93541289   .95116257 |
 19. | 1.8   .93748813   .92859406   .94530721 |
 20. | 1.9   .93091774   .92157156   .93918715 |
     |-----------------------------------------|
 21. |   2   .92412448   .91436051   .93281647 |
     +-----------------------------------------+

stpm3 saves predictions to a new frame by default. The default name for the prediction frame is stpm3_pred, but you can choose a frame name with the frame() option.

. predict S1, survival ci               ///
>             at1(age 60 hormon 0)      ///
>             timevar(0 10, step(0.1))  ///
>             frame(f1)
Predictions are stored in frame - f1

One feature is that you can merge new predictions to an existing frame. When you do this the time variable stored in the existing frame will be used. For example, we could predict for a 70 year old woman and merge with frame f1.

. predict S2, survival ci                 ///
>             at1(age 70 hormon 0)        ///
>             frame(f1, merge)
Predictions are stored in frame - f1

Note that you can specify multiple at() options, which is often easier than having multiple predict commands. Below I use frame(f1, replace) to replace the existing frame f1.

. predict S1 S2, survival ci                 ///
>                at1(age 60 hormon 0)        ///
>                at2(age 70 hormon 0)        ///
>                timevar(0 10, step(0.1))    ///
>                frame(f1, replace)
Predictions are stored in frame - f1

I will cover more details about multiple at() options when I discuss contrasts.

Using frame(..., merge) is useful when you require predictions of different types, for example when requiring predictions of both survival and hazard functions. The predictions for the hazard functions are merged with the survival predictions below.

. predict h1 h2, hazard ci                   ///
>                at1(age 60 hormon 0)        ///
>                at2(age 70 hormon 0)        ///
>                frame(f1, merge)
Predictions are stored in frame - f1

Sometimes you may want to merge the predictions into the active frame rather than create a new frame. You can do this with the merge option. This could be useful if you wanted to predict at each observed value in a dataset, for example at the observed event/censoring times (_t), or at a specific value of time.

The code below, predicts 10 year survival for each observation in the dataset.

. predict S10, survival ci          ///
>              at1(.)               ///
>              timevar(10)          ///
>              merge

Note that only one value of time has been specified in the timevar() optios The default is to predict at this time for all individuals. The at1(.) option means that predictions will be at observed values of covariates rather than values specifed by the user.

We could plot a histogram of the 10 year survival probabilities.

. hist S10
(bin=34, start=.2766368, width=.01245414)

Biostatistician