Modelling covariates in marginal relative survival models

Background

I have described how to fit a marginal relative survival moded to give an internally (age) standardized estimate and how this can be extended to give an externally age standardized estimate. This example shows how to incorporate covariates into the marginal model whilst, still age standardizing.

Example

I again use the Melanoma data, restricting to those diagnosed in the later calendar perdiod, 1985-1994, but will compare relative survival between males and females. I restrict follow-up to 10 years after diagnosis using the exit() option.

. use https://pclambert.net/data/melanoma.dta if year8594 == 1 
(Skin melanoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm, failure(status=1,2) id(id) exit(time 120.5) scale(12)

                id:  id
     failure event:  status == 1 2
obs. time interval:  (surv_mm[_n-1], surv_mm]
 exit on or before:  time 120.5
    t for analysis:  time/12

------------------------------------------------------------------------------
      4,744  total observations
          0  exclusions
------------------------------------------------------------------------------
      4,744  observations remaining, representing
      4,744  subjects
      1,401  failures in single-failure-per-subject data
 22,003.417  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  10.04167

. gen female = sex==2

I will first estimate the non-parametric estimate of marginal relative survival using stpp. I will use the by(female) option to estimate separately for males and females. As the age distribution between males and females could potentially be different I need to age standardize. I will use the ICSS weights (Corazziari et al.).

. // change age groups to those defined in ICSS
. drop agegrp

. egen agegrp=cut(age), at(0  45 55 65 75 200) icodes

. replace agegrp = agegrp + 1
(4,744 real changes made)

. label variable agegrp "Age group"

. label define agegrplab 1 "0-44" 2 "45-54" 3 "55-64" 4 "65-74" 5 "75+", replace

. label values agegrp agegrplab

. 
. recode agegrp (1=0.28) (2=0.17) (3=0.21) (4=0.20) (5=0.14), gen(ICSSwt)
(4744 differences between agegrp and ICSSwt)

The relative weights (explained in the example on external age standardization) have to been calculated separately for males and females. This can be done as follows.

. //Proportion within each age group by sex to calculate weights
. bysort female: egen totalsex = total(sex)

. bysort agegrp female: gen a_age_sex = _N/totalsex

. gen double wt_age_sex = ICSSwt/a_age_sex

The non-parametric Pohar estimator can be obtained using stpp.

. stpp R_pp using https://pclambert.net/data/popmort.dta, /// 
>                 agediag(age) datediag(dx) pmother(sex)  ///
>                 by(female)                              ///
>                 indweights(wt_age)

. preserve

. rename _t PP_t

. rename female PP_female

. keep R_pp* PP_t PP_female

. save PP, replace
file PP.dta saved

. restore                

I have saved the Pohar Perme estimates in a dataset, so I can merge them in after using mrsprep.

Using mrsprep with to enable modelling of covariates

As the individual level weights have been calculated all that has to be added to mrsprep is the by(female) option. This will calculate the mean expected mortality rate needed to fit the model separately for males and females. The individual weights are incorporated into both the weighted mean expected mortality rate and the time-dependent weights.

. mrsprep using https://pclambert.net/data/popmort.dta   ///
>               , pmother(sex) agediag(age) datediag(dx) ///
>                 breaks(0(0.2)10)                       ///
>                 indweights(wt_age)                     ///
>                 by(female)

Modelling proceeds as before, but now we can model the effect of sex. A proportional excess hazards marginal model can be fitted as follows

. stset tstop [iweight=wt], enter(tstart) failure(event==1)                                          

     failure event:  event == 1
obs. time interval:  (0, tstop]
 enter on or after:  time tstart
 exit on or before:  failure
            weight:  [iweight=wt]

------------------------------------------------------------------------------
    112,229  total observations
          0  exclusions
------------------------------------------------------------------------------
    112,229  observations remaining, representing
      1,401  failures in single-record/single-failure data
 21,994.417  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        10

. stpm2 female, scale(hazard) df(5) bhazard(meanhazard_wt) vce(cluster id) eform
note: delayed entry models are being fitted

Iteration 0:   log pseudolikelihood = -7870.2292  
Iteration 1:   log pseudolikelihood = -7733.3899  
Iteration 2:   log pseudolikelihood = -7733.0109  
Iteration 3:   log pseudolikelihood = -7733.0103  

Log pseudolikelihood = -7733.0103               Number of obs     =    112,229

                                 (Std. Err. adjusted for 4,744 clusters in id)
------------------------------------------------------------------------------
             |               Robust
             |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb           |
      female |   .7012648   .0628018    -3.96   0.000     .5883728    .8358174
       _rcs1 |   2.645664   .1160226    22.19   0.000     2.427762    2.883123
       _rcs2 |   1.152299    .027123     6.02   0.000     1.100347    1.206705
       _rcs3 |   1.058372   .0168409     3.57   0.000     1.025874      1.0919
       _rcs4 |   1.021762   .0125207     1.76   0.079     .9975143    1.046599
       _rcs5 |   1.020064   .0094208     2.15   0.031     1.001765    1.038696
       _cons |   .1248128    .008292   -31.32   0.000     .1095744    .1421705
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

This gives a marginal excess hazard (mortality rate) ratio of 0.70. Note we would expect this be different from a standard (conditional) relative survival model adjusting for age due to the non collapsability of (excess) hazard ratios.

The proportionality assumption can be relaxed by incorporating an interaction between sex and the effect of time from diagnosis.

. stpm2 female, scale(hazard) df(5) bhazard(meanhazard_wt) vce(cluster id) ///
>        tvc(female) dftvc(3)
note: delayed entry models are being fitted

Iteration 0:   log pseudolikelihood = -7871.8435  
Iteration 1:   log pseudolikelihood = -7730.0426  
Iteration 2:   log pseudolikelihood = -7729.2407  
Iteration 3:   log pseudolikelihood =  -7729.234  
Iteration 4:   log pseudolikelihood =  -7729.234  

Log pseudolikelihood =  -7729.234               Number of obs     =    112,229

                                    (Std. Err. adjusted for 4,744 clusters in id)
---------------------------------------------------------------------------------
                |               Robust
                |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
xb              |
         female |   -.386188   .1031955    -3.74   0.000    -.5884474   -.1839286
          _rcs1 |   .9739669   .0606994    16.05   0.000     .8549982    1.092936
          _rcs2 |   .2062294   .0421271     4.90   0.000     .1236618     .288797
          _rcs3 |   .0319845   .0226708     1.41   0.158    -.0124495    .0764185
          _rcs4 |   .0116879   .0131779     0.89   0.375    -.0141404    .0375161
          _rcs5 |    .019083   .0092165     2.07   0.038      .001019    .0371471
   _rcs_female1 |   .0095673   .0880347     0.11   0.913    -.1629776    .1821121
   _rcs_female2 |  -.0939241    .052938    -1.77   0.076    -.1976806    .0098324
   _rcs_female3 |    .045716   .0310842     1.47   0.141     -.015208      .10664
          _cons |  -2.066789   .0678017   -30.48   0.000    -2.199677     -1.9339
---------------------------------------------------------------------------------

. range tt 0 10 101
(112,128 missing values generated)

. predict s_mrs_male, surv timevar(tt) ci at(female 0)

. predict s_mrs_female, surv timevar(tt) ci at(female 1)

I have predicted marginal relative survival separately for males and females. These can be shown in the plot below.

. merge 1:1 _n using PP,  nogenerate

    Result                           # of obs.
    -----------------------------------------
    not matched                       107,485
        from master                   107,485  
        from using                          0  

    matched                             4,744  
    -----------------------------------------

. twoway (rarea R_pp_lci R_pp_uci PP_t if !PP_female, sort connect(stairstep) color(${C1ci})) ///
>        (rarea R_pp_lci R_pp_uci PP_t if PP_female,  sort connect(stairstep) color(${C2ci})) ///
>        (line R_pp PP_t if !PP_female, sort lpattern(dot) connect(stairstep) color(${C1}))   ///
>        (line R_pp PP_t if PP_female,  sort lpattern(dot) connect(stairstep) color(${C2}))   ///
>        (line s_mrs_male* tt,   sort color(${C1} ${C1} ${C1}) lpattern(solid dash dash))     ///
>        (line s_mrs_female* tt, sort color(${C2} ${C2} ${C2}) lpattern(solid dash dash))     ///
>        , legend(order(5 "Males" 8 "Females")                                                ///
>                 ring(0) cols(1) pos(7))                                                     ///
>          ylabel(0.6(0.1)1, format(%3.1f))                                                   ///
>          ytitle("Marginal relative survival")                                               ///
>          xtitle("Years from diagnosis")                                                     ///
>          name(sex_compare, replace)      

Thus we have obtained externally age standardized estimates of marginal relative survival without the need to stratify or model the effect of age.

References

Lambert PC, Syriopoulou E, Rutherford MJ. Direct modelling of age standardized marginal relative survival through incorporation of time-dependent weights. BMC Medical Research Methodology:

Corazziari I, Quinn M, Capocaccia R. Standard cancer patient population for age standardising survival ratios. European Journalo of Cancer 2004;40:2307-2316

Professor of Biostatistics