Relative Survival Models

This is just a draft at present

Relative survival(excess mortality) models

Relative survival(excess mortality) models are used to analyses population-based cancer survival data. As in stpm2 this is done through using the bhazard() option when fitting an stpm3 model.

stpm3 has better predictions than stpm2 through the use of an expsurv() option.

This guide is really aimed at those who are familiar with relative survival models.

The code below loads the example colon cancer data set and merges in the expected mortality rates at the event/censoring times.

. use "https://pclambert.net/data/colon.dta", clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm,f(status=1,2) id(id) scale(12) exit(time 120.5)

Survival-time data settings

           ID variable: id
         Failure event: status==1 2
Observed time interval: (surv_mm[_n-1], surv_mm]
     Exit on or before: time 120.5
     Time for analysis: time/12

--------------------------------------------------------------------------
     15,564  total observations
          0  exclusions
--------------------------------------------------------------------------
     15,564  observations remaining, representing
     15,564  subjects
     10,459  failures in single-failure-per-subject data
 51,685.667  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =  10.04167

. gen _age = floor(min(age+_t,99))

. gen _year = floor(yydx +_t)

. gen male = sex==1

. merge m:1 _age _year sex using "https://pclambert.net/data/popmort.dta", keep(match master)

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                            15,564  (_merge==3)
    -----------------------------------------

To illusrate the prediction options I will fit a model with an interaction between age and sex for both main and time-dependent effects, where the effect of age is modelled using natural splines. I will also use winsoring at the 2nd and 98th centiles of age.

. stpm3 i.male##@ns(age,df(3) winsor(2 98)), ///
>       tvc(i.male##@ns(age,df(3) winsor(2 98))) dftvc(2) ///
>       df(5) scale(lncumhazard)  bhazard(rate)

Iteration 0:   log likelihood = -21662.934  
Iteration 1:   log likelihood = -20783.205  
Iteration 2:   log likelihood = -20730.901  
Iteration 3:   log likelihood = -20726.147  
Iteration 4:   log likelihood = -20725.949  
Iteration 5:   log likelihood = -20725.949  

                                                        Number of obs = 15,564
                                                        Wald chi2(7)  =  17.94
Log likelihood = -20725.949                             Prob > chi2   = 0.0123

-----------------------------------------------------------------------------------------------
                              | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
xb                            |
                       1.male |  -.3529365   .2438788    -1.45   0.148    -.8309303    .1250572
                  _ns_f1_age1 |  -.9225439   .4701276    -1.96   0.050    -1.843977   -.0011107
                  _ns_f1_age2 |   .2828585   .1945014     1.45   0.146    -.0983573    .6640743
                  _ns_f1_age3 |  -.0417451   .2264952    -0.18   0.854    -.4856675    .4021773
                              |
           male#c._ns_f1_age1 |
                           1  |   1.240143   .8476479     1.46   0.143    -.4212162    2.901503
                              |
           male#c._ns_f1_age2 |
                           1  |   .0103134   .2858798     0.04   0.971    -.5500007    .5706275
                              |
           male#c._ns_f1_age3 |
                           1  |   .4861668   .4344473     1.12   0.263    -.3653343    1.337668
------------------------------+----------------------------------------------------------------
time                          |
                         _ns1 |  -7.658599   .4600483   -16.65   0.000    -8.560277    -6.75692
                         _ns2 |   2.685312   .1249449    21.49   0.000     2.440425    2.930199
                         _ns3 |    -.23977   .1295897    -1.85   0.064    -.4937612    .0142211
                         _ns4 |   .0815266   .1095782     0.74   0.457    -.1332427    .2962959
                         _ns5 |   .2678158   .0959583     2.79   0.005      .079741    .4558905
                              |
              male#c._ns_tvc1 |
                           1  |   .9938126   .4540786     2.19   0.029     .1038348     1.88379
                              |
              male#c._ns_tvc2 |
                           1  |    1.11133   .5122449     2.17   0.030     .1073481    2.115311
                              |
     c._ns_f1_age1#c._ns_tvc1 |  -17.30983   2.373232    -7.29   0.000    -21.96128   -12.65838
                              |
     c._ns_f1_age1#c._ns_tvc2 |   -5.08175    1.12152    -4.53   0.000     -7.27989   -2.883611
                              |
     c._ns_f1_age2#c._ns_tvc1 |   3.098712   1.430068     2.17   0.030     .2958306    5.901593
                              |
     c._ns_f1_age2#c._ns_tvc2 |  -.3393348   .5099432    -0.67   0.506    -1.338805    .6601355
                              |
     c._ns_f1_age3#c._ns_tvc1 |  -3.389762    .565698    -5.99   0.000     -4.49851   -2.281014
                              |
     c._ns_f1_age3#c._ns_tvc2 |  -2.109923   .5190539    -4.06   0.000     -3.12725   -1.092596
                              |
male#c._ns_f1_age1#c._ns_tvc1 |
                           1  |   1.102056   3.237954     0.34   0.734    -5.244216    7.448329
                              |
male#c._ns_f1_age1#c._ns_tvc2 |
                           1  |  -2.843998   1.856682    -1.53   0.126    -6.483028    .7950322
                              |
male#c._ns_f1_age2#c._ns_tvc1 |
                           1  |  -2.353452   1.906681    -1.23   0.217    -6.090479    1.383574
                              |
male#c._ns_f1_age2#c._ns_tvc2 |
                           1  |  -.2610571   .7417486    -0.35   0.725    -1.714858    1.192744
                              |
male#c._ns_f1_age3#c._ns_tvc1 |
                           1  |  -.2141605   .9766446    -0.22   0.826    -2.128349    1.700028
                              |
male#c._ns_f1_age3#c._ns_tvc2 |
                           1  |  -2.073725   .9441279    -2.20   0.028    -3.924182   -.2232685
                              |
                        _cons |  -.1063003   .1227919    -0.87   0.387     -.346968    .1343674
-----------------------------------------------------------------------------------------------
Extended functions
 (1) @ns(age, df(3) winsor(2 98))
Warning: This is a test version of stpm3

The use of bhazard(rate) makes this a relative survival model. This means that relative survival will be predicted when using the survival option of the predict command. Similarly, the excess mortality (hazard) rate will be predicted when uing the hazard option of the predict command.

Some predictions require use of the expsurv() option to merge in the expected mortality rates.

The various predictions will be for males and females aged 70.

Relative Survival

First I will predict relative survival and the difference between males and females.

. predict Rm Rf, at1(male 1 age 70)         ///
>                at2(male 0 age 70)         ///
>                surv  ci                   ///
>                timevar(0 10, step(0.1))   ///
>                contrast(difference)       ///
>                contrastvar(Rdiff)         ///
>                frame(f1)
Predictions are stored in frame - f1

The predictions are saved in frame f1, and can be plotted.

. frame f1 {
.   line Rm Rf tt, name(RS, replace)    ///
>        xtitle("Time since diagnosis") ///
>        ytitle(R(t))
.   twoway (rarea Rdiff_lci Rdiff_uci tt, color(red%30)) ///
>          (line Rdiff tt, color(red)),                 ///
>          xtitle("Time since diagnosis")                ///
>          ytitle(Difference in R(t))                    /// 
>          legend(off)                                   ///
>          name(RSdiff, replace)
. }

All-cause survival

Using the expsurv() option allows all-cause survival to be calculated.

$$ S(t|X) = S^*(t|X)R(t|X) $$

. predict Sm Sf, surv  ci ///
>                at1(male 1 age 70)   ///
>                at2(male 0 age 70)   ///
>                contrast(difference) ///
>                contrastvar(Sdiff)   ///               
>                frame(f1, merge)     ///
>                expsurv(using("https://www.pclambert.net/data/popmort") ///  Popmort file
>                        agediag(70)        ///  Age at diagnosis in the dataset
>                        datediag(1990-1-1) ///  Date of diagnosis in the dataset
>                        pmother(sex)       ///  Other variables included in the popmort file
>                        pmrate(rate)       ///  Rate variable in the popmort file  
>                        at1(sex 1)         ///
>                        at2(sex 2)         ///
>                        )                
Predictions are stored in frame - f1

. frame f1 {
.   line Sm Sf tt, name(S, replace)     ///
>        xtitle("Time since diagnosis") ///
>        ytitle(S(t))
.   twoway (rarea Sdiff_lci Sdiff_uci tt, color(red%30)) ///
>          (line Sdiff tt, color(red)),                 ///
>          xtitle("Time since diagnosis")                ///
>          ytitle(Difference in S(t))                    /// 
>          legend(off)                                   ///
>          name(Sdiff, replace)  
. }

Marginal all-cause hazard

This is the same as above, but replace the survival' option with hazard`.

. predict hm hf, hazard  ci ///
>                at1(male 1 age 70)   ///
>                at2(male 0 age 70)   ///
>                contrast(difference) ///
>                contrastvar(hdiff)   ///               
>                frame(f1, merge)     ///
>                expsurv(using("https://www.pclambert.net/data/popmort") ///  Popmort file
>                        agediag(70)        ///  Age at diagnosis in the dataset
>                        datediag(1990-1-1) ///  Date of diagnosis in the dataset
>                        pmother(sex)       ///  Other variables included in the popmort file
>                        pmrate(rate)       ///  Rate variable in the popmort file  
>                        at1(sex 1)         ///
>                        at2(sex 2)         ///
>                        )                
Predictions are stored in frame - f1

. frame f1 {
.   line hm hf tt, name(S, replace)     ///
>        xtitle("Time since diagnosis") ///
>        ytitle(h(t))
.   twoway (rarea hdiff_lci hdiff_uci tt, color(red%30)) ///
>          (line hdiff tt, color(red)),                 ///
>          xtitle("Time since diagnosis")                ///
>          ytitle(Difference in h(t))                    /// 
>          legend(off)                                   ///
>          name(Sdiff, replace)    
.   line hm hf tt, name(h, replace)
.   line hdiff* tt, name(hdiff, replace)
. }

Crude Probabilities

The default is to only give crude probability of death due to cancer. You can use the expvar option to option crude probablities of death due to other causes.

. predict Ccm Ccf, crudeprob ci ///
>                at1(male 1 age 70)  ///
>                at2(male 0 age 70)  ///
>                contrast(difference) ///
>                contrastvar(Ccdiff)   ///               
>                frame(f1, merge)    ///
>                expsurv(using("https://www.pclambert.net/data/popmort") ///  Popmort file
>                        agediag(70)      ///  Age at diagnosis in the dataset
>                        datediag(1990-1-1)      ///  Date of diagnosis in the dataset
>                        pmother(sex)       ///  Other variables included in the popmort file
>                        pmrate(rate)       ///  Rate variable in the popmort file  
>                        at1(sex 1)         ///
>                        at2(sex 2)         ///
>                        expvar(Com Cof) ///
>                        )                      


Predictions are stored in frame - f1

Life expectancy

This is obtained through separate extrpolation of the relative and expected survival.

. gen t80 = 80 in 1
(15,563 missing values generated)

. predict LEf LEm, rmst ci            ///
>                at1(male 1 age 70)   ///
>                at2(male 0 age 70)   ///
>                contrast(difference) ///
>                contrastvar(Ccdiff)  /// 
>                timevar(t80)         ///
>                frame(f2)            ///
>                expsurv(using("https://www.pclambert.net/data/popmort") ///  Popmort file
>                        agediag(70)      ///  Age at diagnosis in the dataset
>                        datediag(1990-1-1)      ///  Date of diagnosis in the dataset
>                        pmother(sex)       ///  Other variables included in the popmort file
>                        pmrate(rate)       ///  Rate variable in the popmort file  
>                        pmmaxyear(2000)    ///
>                        at1(sex 1)         ///
>                        at2(sex 2)         ///
>                        expvar(ELEm ELEf) ///
>                        )




Predictions are stored in frame - f2

.                        
. frame f2: {
.   list LEf* ELEf                       

     +-----------------------------------------------+
     |       LEf     LEf_lci     LEf_uci        ELEf |
     |-----------------------------------------------|
  1. | 5.6608016   5.3998835   5.9343271   14.668102 |
     +-----------------------------------------------+
.   list LEm* ELEm                       

     +-----------------------------------------------+
     |       LEm     LEm_lci     LEm_uci        ELEm |
     |-----------------------------------------------|
  1. | 7.0858031   6.8275937   7.3537775   11.332229 |
     +-----------------------------------------------+
. }                

Life expectancy over a range of ages

The code below creates a frame with a range of ages to predict life expectencty.

. capture frame drop ageLEL

. frame create ageLEL

. frame ageLEL {
.   range age 50 99 50
Number of observations (_N) was 0, now 50.
.   gen male = .
(50 missing values generated)
.   gen sex = .
(50 missing values generated)
.   predict LEm LEf, rmst ci                ///
>                  at1(male 1, obsvalues)   ///
>                  at2(male 0, obsvalues)   ///
>                  timevar(80)              ///
>                  merge                    ///
>                  expsurv(using("https://www.pclambert.net/data/popmort") ///  Popmort file
>                          agediag(age)       ///  Age at diagnosis in the dataset
>                          datediag(1990-1-1) ///  Date of diagnosis in the dataset
>                          pmother(sex)       ///  Other variables included in the popmort file
>                          pmrate(rate)       ///  Rate variable in the popmort file  
>                          pmmaxyear(2000)    ///
>                          at1(sex 1)         ///
>                          at2(sex 2)         ///
>                          expvar(ELEm ELEf)  ///
>                          )




.  gen LELm = ELEm - LEm                         
.  gen LELf = ELEf - LEf                         
. }

The predicted values can now be plotted.

. frame ageLEL {
.   line LEf ELEf LELf age,                             ///
>        xtitle("Age at diagnosis")                 ///
>        ytitle(Life Expectency)                        ///
>        legend(order(1 "Life Expectancy"               ///
>                     2 "Expected Life Expectancy"      ///
>                     3 "Reduction in Life Expectancy")  ///
>               ring(0) cols(1) pos(1))
. }              

Biostatistician