External age standardization in marginal relative survival models

Background

I have described how to fit a marginal relative survival moded which gives an internally standardized estimate of marginal relative survival. This example shows how introducing inidvidual level weights canbe used to standardize to an external population.

Example

I again use the Melanoma data, restricting to those diagnosed in the later calendar perdiod, 1985-1994. I restrict follow-up to 10 years after diagnosis using the exit() option.

. use https://pclambert.net/data/melanoma.dta if year8594 == 1 
(Skin melanoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm, failure(status=1,2) id(id) exit(time 120.5) scale(12)

                id:  id
     failure event:  status == 1 2
obs. time interval:  (surv_mm[_n-1], surv_mm]
 exit on or before:  time 120.5
    t for analysis:  time/12

------------------------------------------------------------------------------
      4,744  total observations
          0  exclusions
------------------------------------------------------------------------------
      4,744  observations remaining, representing
      4,744  subjects
      1,401  failures in single-failure-per-subject data
 22,003.417  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  10.04167

I will first estimate the non-parametric estimate of marginal relative survival using stpp, but externally age standardize to the age distribution defined in the International Cancer Survival Standard (ICSS) (Corazziari et al.). For melanoma the ICSS age distribution is

Age Group Proportion
0-44 0.28
45-54 0.17
55-64 0.21
65-74 0.20
75+ 0.14

When we age standardize we estimate what the marginal relative survival would be if the study population had the age distribution seen in the above table.

The following code calculates the ICSS age groups (agegrp) and a variable containing the ICSS weighs (ICSSwt).

. // change age groups to those defined in ICSS
. drop agegrp

. egen agegrp=cut(age), at(0  45 55 65 75 200) icodes

. replace agegrp = agegrp + 1
(4,744 real changes made)

. label variable agegrp "Age group"

. label define agegrplab 1 "0-44" 2 "45-54" 3 "55-64" 4 "65-74" 5 "75+", replace

. label values agegrp agegrplab

. 
. recode agegrp (1=0.28) (2=0.17) (3=0.21) (4=0.20) (5=0.14), gen(ICSSwt)
(4744 differences between agegrp and ICSSwt)

mrsprep calculates time-dependent weights based on the inverse of the expected survival. To standardize to an external population we need to up or down weight individuals relative to the reference population. Let $p^a_i$ be the proportion in the age group to which the $i^{th}$ individual belongs and $p^R_i$ be the corresponding proportion in the reference population. Weights, $w_i^a$ are defined as the ratio between these two proportions.

$$ w_i^a = \frac{p^R_i}{p^a_i} $$ These weights can then be combined with the inverse expected survival weights, $$ w_i(t) = \frac{w_i^a}{S_i(t)} $$

mrsprep has an indweights() option where the indidual level weights are passed to the command. These can be calculated as follows,

. // Proportion within each age group
. local total= _N

. bysort agegrp: gen a_age = _N/`total'

. gen double wt_age = ICSSwt/a_age

These same weights can also be used in the non-parametric Pohar Perme estimator (Rutherford et al 2020 ) as shown below.

. stpp R_pp using https://pclambert.net/data/popmort.dta, /// 
>                 agediag(age) datediag(dx) pmother(sex)  ///
>                 indweights(wt_age)

. preserve

. rename _t PP_t

. keep R_pp* PP_t

. save PP, replace
file PP.dta saved

. restore                

I have saved the Pohar Perme estimates in a dataset, so I can merge them in after using mrsprep.

The same option can be used with mrsprep. The code for mrsprep is shown below.

. mrsprep using https://pclambert.net/data/popmort.dta   ///
>               , pmother(sex) agediag(age) datediag(dx) ///
>                 breaks(0(0.2)10)                       ///
>                 indweights(wt_age)                     

Then a marginal model can be fitted in exactly the same way as when using internal age standardization.

. stset tstop [iweight=wt], enter(tstart) failure(event==1)                                          

     failure event:  event == 1
obs. time interval:  (0, tstop]
 enter on or after:  time tstart
 exit on or before:  failure
            weight:  [iweight=wt]

------------------------------------------------------------------------------
    112,229  total observations
          0  exclusions
------------------------------------------------------------------------------
    112,229  observations remaining, representing
      1,401  failures in single-record/single-failure data
 21,994.417  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        10

. stpm2, scale(hazard) df(5) bhazard(meanhazard_wt) vce(cluster id)
note: delayed entry models are being fitted

Iteration 0:   log pseudolikelihood = -5408.4497  
Iteration 1:   log pseudolikelihood = -5311.8784  
Iteration 2:   log pseudolikelihood = -5311.5247  
Iteration 3:   log pseudolikelihood = -5311.5239  
Iteration 4:   log pseudolikelihood = -5311.5239  

Log pseudolikelihood = -5311.5239               Number of obs     =    112,229

                                 (Std. Err. adjusted for 4,744 clusters in id)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb           |
       _rcs1 |   .9717626   .0412897    23.54   0.000     .8908362    1.052689
       _rcs2 |   .1581555   .0231843     6.82   0.000     .1127152    .2035958
       _rcs3 |   .0570444   .0152915     3.73   0.000     .0270737    .0870152
       _rcs4 |   .0200182   .0117286     1.71   0.088    -.0029694    .0430059
       _rcs5 |   .0182224   .0087545     2.08   0.037     .0010639     .035381
       _cons |  -2.245297   .0500104   -44.90   0.000    -2.343316   -2.147278
------------------------------------------------------------------------------

. range tt 0 10 101
(112,128 missing values generated)

. predict s_mrsprep, surv timevar(tt) ci

After fitting the model, the marginal relative survival has been predicted. This can now be compared to the Pohar Perme non-parametric estimate.

. merge 1:1 _n using PP,  nogenerate

    Result                           # of obs.
    -----------------------------------------
    not matched                       107,485
        from master                   107,485  
        from using                          0  

    matched                             4,744  
    -----------------------------------------

. twoway (rarea R_pp_lci R_pp_uci PP_t, sort connect(stairstep) color(${C1ci}))   ///
>        (line R_pp PP_t, sort connect(stairstep) color(${C1}))                   ///
>        (line s_mrsprep* tt, color(${C2} ${C2} ${C2}) lpattern(solid dash dash)) ///
>        , legend(order(2 "Pohar Perme" 3 "Marginal stpm2 model")                 ///
>                 ring(0) cols(1) pos(7))                                         ///
>          ylabel(0.6(0.1)1, format(%3.1f))                                       ///
>          ytitle("Marginal relative survival")                                   ///
>          xtitle("Years from diagnosis")                                         ///
>          name(int_stand_standsurv, replace)      

There is good agreement between the model based and the non parametric Pohar Perme estimtor.

The estimate here is an externally age standardized estimate, but is actually very similar to the internally age standardized estimate as the age distribution in the study population is similar to the age distribution in the reference population.

References

Lambert PC, Syriopoulou E, Rutherford MJ. Direct modelling of age standardized marginal relative survival through incorporation of time-dependent weights. BMC Medical Research Methodology:

Corazziari I, Quinn M, Capocaccia R. Standard cancer patient population for age standardising survival ratios. European Journalo of Cancer 2004;40:2307-2316

Rutherford, M.J., Dickman, P.W., Coviello, E. & Lambert, P.C. Estimation of age-standardized net survival, even when age-specific data are sparse. Cancer Epidemiology 2020;67:101745.

Professor of Biostatistics