Paul Lambert
  • Home
  • Publications
  • Software & Tutorials
  • Talks
  • Courses
  • Interactive graphs

On this page

  • By Sarah Booth (sarah.booth@le.ac.uk)
  • Background
  • Methods
  • Example (simulated colon cancer data)
    • Prepare data
    • Fitting the model
    • Marginal crude probabilities
    • Standardization using contrasts
    • Specific covariate patterns
  • References

Standardized Crude Probabilities of Death

Author

Sarah Booth

By Sarah Booth (sarah.booth@le.ac.uk)

Download Stata Do file here

You will need to install standsurv to run the example. Details here

Background

The standardized relative survival tutorial introduced the concept of relative survival and illustrated how flexible parametric survival models can be fitted in this framework. Under certain conditions, relative survival can be interpreted as net survival, the survival in a hypothetical world where it is not possible to die from other causes. This measure is often used to make fair comparisons of cancer survival between different countries or populations as only the excess mortality related to the cancer diagnosis is analysed and differences in other cause mortality are ignored.

However, measures of survival in the “real world” can be more useful for clinical decision making as they consider the competing risk of dying from causes other than cancer. This tutorial demonstrates how standsurv can be used to estimate the probabilities of dying from cancer and other causes after flexible parametric survival models are fitted in the relative survival setting. In the relative survival framework, these are known as crude probabilities of death, whereas in a cause-specific setting, they are referred to as cause-specific cumulative incidence functions (CIF). Further details on how to calculate these measures in the cause-specific setting can be found here.

Methods

Fcancer(t|xi) denotes the probability of death due to cancer and can be calculated using the following equation where the relative survival function R(u|x1i) and the excess hazard due to cancer λ(u|x1i) can both be obtained from the relative survival model. S∗(u|x2i) is the expected survival of a similar group of people in the general population without cancer and can be obtained from the population life tables (also known as a “popmort” file). The life tables used in this particular example correspond to the expected survival in England and are stratified by calendar year, age, sex and deprivation group.

x1 is a subset of x and includes the covariates relating to the excess mortality such as age at diagnosis, sex, deprivation group and stage at diagnosis. x2 is a different subset of x and contains the factors that the life tables are stratified by, which in this particular example, are age, calendar year, sex and deprivation group.

Fcancer(t|xi)=∫0tS(u|xi)λ(u|x1i)du=∫0tS∗(u|x2i)R(u|x1i)λ(u|x1i)du

Fother(t|xi) is the probability of dying from causes other than cancer, where h∗(t|x2i) is the expected hazard function and can be obtained from the population life tables.

Fother(t|xi)=∫0tS(u|xi)h∗(u|x2i)du=∫0tS∗(u|x2i)R(u|x1i)h∗(u|x2i)du

The crude probabilities of death due to each cause sum to the all cause probability of death.

Fallcause(t|xi)=Fcancer(t|xi)+Fother(t|xi)

Example (simulated colon cancer data)

Prepare data

This tutorial uses simulated data from a paper by Syriopoulou et al. It is based on colon cancer survival in England and is restricted to only include the most and least deprived quintile of the population.

This dataset contains the following variables: ID number (id), age at diagnosis (agediag, 16-104), stage of tumour at diagnosis (stage, stages 1-4), year of diagnosis (yeardiag, 2011-2013), month of diagnosis (diagmonth), date of diagnosis (datediag), sex (sex, 0 = Male, 1 = Female), survival time in years (t, 0.0027 - 10), survival status (dead, 0 = Alive, 1 = Dead) and deprivation quintile (dep, 1 = Least deprived, 5 = Most Deprived).

As computation is slow I have taken a 20% sample of the original data, which leads to 3,061. individuals.

To prepare the data, I first format the date of diagnosis variable and restrict the analysis to individuals who were diagnosed with colon cancer between the ages of 18 and 99. I also need to recode the variable relating to sex as currently, 0 = Male and 1 = Female, whereas in the life tables (popmort file), 1 = Male and 2 = Female. Recoding this variable means that the life tables will be correctly merged in.

. set seed 128763

. use https://www.pclambert.net/data/colonsim_stage if runiform()<0.2, clear

. // Format datediag to display as a date
. format datediag %td

. // Restrict analysis to patients aged 18-99 at diagnosis
. keep if agediag>=18 & agediag<=99
(0 observations deleted)

. // Recode the sex variable to match the popmort file
. replace sex = sex+1
(3,061 real changes made)

. label define label_sex 1 "Male" 2 "Female" 

. label values sex label_sex 

. gen female = sex==2

stset can then be used to calculate the survival time of each of the 15,627 individuals and to censor any individuals who were still alive 5 years after their diagnosis.

. stset t, failure(dead=1) id(id) exit(time 5)

Survival-time data settings

           ID variable: id
         Failure event: dead==1
Observed time interval: (t[_n-1], t]
     Exit on or before: time 5

--------------------------------------------------------------------------
      3,061  total observations
          0  exclusions
--------------------------------------------------------------------------
      3,061  observations remaining, representing
      3,061  subjects
      1,581  failures in single-failure-per-subject data
  9,926.831  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =         5

In order to fit a relative survival model, the expected mortality rate of each individual at their event time is required. To identify the correct expected mortality rates, I first calculate the attained age (age of the individual at their event time) and attained year (calendar year when the event or censoring occurs). I name these variables age and year to match the variable names in the life tables. As the maximum age included in the life tables is 100, I force the maximum attained age to be set as 100. Similarly, as the life tables only go up to 2016, I also make the maximum attained year to be 2016. This makes the assumption that the expected rates in 2018 for each combination of age, sex and deprivation group are the same as they were in 2016. The expected mortality rates can then be merged in by matching for attained age, attained year, sex and deprivation quintile.

. // Attained age
. gen age = min(floor(agediag + _t),100)

. // Attained calendar year
. gen year = min(floor(yeardiag + _t),2016)

. // Merge in life tables
. merge m:1 age year dep sex using https://www.pclambert.net/data/popmort_uk_2017, ///
> keep(match master) keepusing(rate)

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                             3,061  (_merge==3)
    -----------------------------------------

. drop age year

I drop the attained age (age) and attained year (year) variables now that the expected rates have been merged in.

Previously, when using stpm2 various variables needed to be created. This included restricted cubic spline variables for the effect of age, dummy variables for stage and various interactions. For the spline variables, the knot positions and projection matrix for orthogonalization needed to be save. Using stpm3 makes things much easier, as all these can be included in the commands itself.

Fitting the model

Now I fit the flexible parametric survival model used by Syriopoulou et al (2021) that includes age at diagnosis (using natural splines), sex, deprivation group and stage at diagnosis as covariates, along with interaction terms between stage and deprivation group. It also includes time-dependent effects for the main effects of deprivation group, stage and age (using natural splines). It uses 5 degrees of freedom for the baseline and 3 degrees of freedom to model the time-dependent effects.

. stpm3 i.female i.dep i.stage i.stage#dep @ns(agediag,df(3)), scale(lncumhazard) df(5) ///
>                tvc(i.dep i.stage @ns(agediag,df(3))) dftvc(3) bhazard(rate) vsquish

Iteration 0:  Log likelihood = -3104.5329  
Iteration 1:  Log likelihood =  -3019.487  
Iteration 2:  Log likelihood = -3008.7321  
Iteration 3:  Log likelihood = -3007.9984  
Iteration 4:  Log likelihood = -3007.8399  
Iteration 5:  Log likelihood = -3007.7991  
Iteration 6:  Log likelihood = -3007.7894  
Iteration 7:  Log likelihood = -3007.7867  
Iteration 8:  Log likelihood = -3007.7859  
Iteration 9:  Log likelihood = -3007.7857  
Iteration 10: Log likelihood = -3007.7856  
Iteration 11: Log likelihood = -3007.7856  

                                                        Number of obs =  3,061
                                                        Wald chi2(11) = 541.45
Log likelihood = -3007.7856                             Prob > chi2   = 0.0000

----------------------------------------------------------------------------------------------
                             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------------------+----------------------------------------------------------------
xb                           |
                    1.female |   .0258002   .0611239     0.42   0.673    -.0940004    .1456009
                       5.dep |   11.33406   610.7999     0.02   0.985    -1185.812     1208.48
                       stage |
                          2  |   11.77162   610.8012     0.02   0.985    -1185.377     1208.92
                          3  |   13.63744   610.8011     0.02   0.982    -1183.511    1210.786
                          4  |   14.87865   610.8011     0.02   0.981     -1182.27    1212.027
                   stage#dep |
                        2 5  |  -11.34975   610.7999    -0.02   0.985    -1208.496    1185.796
                        3 5  |  -11.19332   610.7999    -0.02   0.985    -1208.339    1185.952
                        4 5  |  -11.26091   610.7999    -0.02   0.985    -1208.407    1185.885
             _ns_f1_agediag1 |   .8547894   1.699503     0.50   0.615    -2.476176    4.185755
             _ns_f1_agediag2 |  -1.255521   .6266172    -2.00   0.045    -2.483668   -.0273739
             _ns_f1_agediag3 |   .1428389   .6831001     0.21   0.834    -1.196013    1.481691
-----------------------------+----------------------------------------------------------------
time                         |
                        _ns1 |  -46.59876   30.73218    -1.52   0.129    -106.8327    13.63522
                        _ns2 |   18.43374   12.93551     1.43   0.154    -6.919398    43.78689
                        _ns3 |  -.1336661   1.018261    -0.13   0.896    -2.129421    1.862089
                        _ns4 |   .2966349   .6233048     0.48   0.634    -.9250201     1.51829
                        _ns5 |   .3442934   .4569246     0.75   0.451    -.5512623    1.239849
              dep#c._ns_tvc1 |
                          5  |   -1.05234   1.780841    -0.59   0.555    -4.542724    2.438045
              dep#c._ns_tvc2 |
                          5  |    1.22833    1.00034     1.23   0.219       -.7323    3.188959
              dep#c._ns_tvc3 |
                          5  |   .3265529   .1202818     2.71   0.007      .090805    .5623008
            stage#c._ns_tvc1 |
                          2  |   15.54094   31.12577     0.50   0.618    -45.46444    76.54632
                          3  |   20.29633   30.75615     0.66   0.509    -39.98462    80.57727
                          4  |   22.62941   30.67119     0.74   0.461    -37.48501    82.74383
            stage#c._ns_tvc2 |
                          2  |  -4.520576   15.57263    -0.29   0.772    -35.04238    26.00122
                          3  |  -9.567492    15.3728    -0.62   0.534    -39.69763    20.56265
                          4  |  -9.400435   15.31143    -0.61   0.539    -39.41028    20.60941
            stage#c._ns_tvc3 |
                          2  |  -.3308294   .7373107    -0.45   0.654    -1.775932    1.114273
                          3  |  -1.137763   .7047464    -1.61   0.106    -2.519041    .2435142
                          4  |   -.448099   .6956624    -0.64   0.519    -1.811572    .9153742
c._ns_f1_agediag1#c._ns_tvc1 |   20.12228   34.46616     0.58   0.559    -47.43015     87.6747
c._ns_f1_agediag1#c._ns_tvc2 |   -30.2422   19.20078    -1.58   0.115    -67.87503    7.390637
c._ns_f1_agediag1#c._ns_tvc3 |  -4.893479   2.803528    -1.75   0.081    -10.38829    .6013342
c._ns_f1_agediag2#c._ns_tvc1 |  -6.719063    18.2678    -0.37   0.713    -42.52329    29.08517
c._ns_f1_agediag2#c._ns_tvc2 |   7.470501   10.16327     0.74   0.462    -12.44915    27.39015
c._ns_f1_agediag2#c._ns_tvc3 |  -.0054526   1.219612    -0.00   0.996    -2.395848    2.384942
c._ns_f1_agediag3#c._ns_tvc1 |   9.465698   9.075787     1.04   0.297    -8.322518    27.25391
c._ns_f1_agediag3#c._ns_tvc2 |  -10.82306   5.100466    -2.12   0.034    -20.81979   -.8263326
c._ns_f1_agediag3#c._ns_tvc3 |  -1.067077   .9595113    -1.11   0.266    -2.947684    .8135306
                       _cons |  -14.02936   610.8006    -0.02   0.982    -1211.176    1183.118
----------------------------------------------------------------------------------------------
Extended functions
 (1) @ns(agediag, df(3))

Marginal crude probabilities

Now that the relative survival model is fitted, the crude probabilities of death can be estimated. The marginal crude probabilities of death are an average measure that can be used to summarise the mortality of the N individuals used to develop the model. Each individual’s predicted probability of death from each cause is calculated at a particular time point and then averaged.

F^M,cancer(t)=1N∑i=1NF^cancer(t|xi)

F^M,other(t)=1N∑i=1NF^other(t|xi)

In standsurv this can be achieved by specifying the crudeprob option. Including at1(.) means that the predictions for each individual will be made based on their observed covariate values in the dataset. Using the timevar() option means that the marginal crude probabilities will be calculated at a particular time point or a series of time points. Here t5 allows these probabilities to be estimated at 51 time points so that a smooth curve across the 5 year follow-up period can be produced.

As this calculation requires the expected mortality rates, the population life tables need to be specified using expsurv() which links the age of the individuals in the dataset to their attained age and calendar year at each time point. As the population life tables are stratified by sex and deprivation these are specified using pmother(). As the maximum age in the life tables is 100 and the maximum year is 2016, I specify these options using pmmaxage() and pmmaxyear() respectively so that any values greater than these will be set to the maximum value. The atvar() option is used to name the variables where the predictions will be stored. By calling this “marg”, by default it will create a variable called marg_disease and another named marg_other to save the crude probabilities of death due to cancer and other causes respectively.

. range t5 0 5 51
(3,010 missing values generated)

. standsurv, crudeprob               ///  Crude probabilities of death
>            timevar(t5)             ///  Time points used for predictions
>            frame(f1, replace)      ///  Frame to save results
>            at1(.)                  ///  Use observed covariate values            
>            atvar(marg)             ///  New variable containing the predictions
>            expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
>              agediag(agediag)      ///  Age at diagnosis in the dataset
>              datediag(datediag)    ///  Date of diagnosis in the dataset
>              pmage(age)            ///  Age variable in the popmort file
>              pmyear(year)          ///  Year variable in the popmort file                     
>              pmother(dep sex)      ///  Other variables included in the popmort file
>              pmrate(rate)          ///  Rate variable in the popmort file
>              pmmaxage(100)         ///  Maximum age in the popmort file
>              pmmaxyear(2016)       ///  Maximum year in the popmort file
>              ) 

We now plot the results.

. frame f1 {
.   twoway (line marg_disease t5),               ///
>          xtitle("Years since Diagnosis")       ///
>          ytitle("Probability of Death")        ///
>          ylabel(,format(%3.1f))                ///
>          title("Cancer")                       /// 
>          name(marg_disease,replace)
.   twoway (line marg_other t5, pstyle(p2line)), /// 
>          xtitle("Years since Diagnosis")       ///
>          ytitle("Probability of Death")        ///
>          ylabel(0(0.1)0.4,format(%3.1f))       ///
>          title("Other Causes")                 ///
>          name(marg_other,replace) 
. 
. graph combine marg_disease marg_other
. }

Here we can see that within this group of people, the probability of dying from cancer is over 3 times larger than dying from other causes by 5 years after diagnosis. An alternative way to present these predictions is to produce a stacked graph by adding the crude probabilities of death from each cause together. The blue line then indicates the all-cause probability of death and each coloured region indicates the probability of being alive, dying from cancer and dying from other causes.

. frame f1 {
.   gen alive = 1
.   gen marg_all = marg_disease + marg_other
.   twoway (area alive t5, fintensity(30))           ///
>          (area marg_all t5, fintensity(30))        ///
>          (area marg_other t5,  fintensity(30)),    ///
>          ytitle("Probability")                     ///
>          xtitle("Time from diagnosis")             ///
>          ylabel(,format(%3.1f))                    /// 
>          legend(order(1 "Alive"                    ///
>                       2 "Death from Cancer"        ///
>                       3 "Death from Other Causes") ///
>                          rows(1) ring(1) pos(6)) 
. }

Standardization using contrasts

The contrast() option can be used to investigate differences between subgroups in the population. As shown in the standardized relative survival tutorial, we might be interested in the effect of deprivation.

We cannot fairly compare these groups using their observed covariate values since this would mean we would be averaging over different covariate patterns within each deprivation group. For example, in the most deprived group there is a greater proportion of individuals diagnosed with Stage 4 tumours so we wouldn’t know whether the differences in the marginal crude probabilities of death were due to the effect of deprivation or other factors such as stage.

To account for this we can first make predictions for all individuals by supposing that they are all in the most deprived group (i.e. setting dep5=1 for everyone regardless of their true deprivation group but keeping the observed values of all other covariates in the dataset). We can then make a second set of predictions where all individuals are assumed to be in the least deprived group (dep5=0). This is called standardization and further examples can be found in the standardized relative survival tutorial.

This approach allows us to investigate the differences in the marginal crude probabilities of death for this hypothetical population. Mathematically, it can be written as the following where Z is the binary covariate dep5 (Z=1 is the most deprived group and Z=0 is the least deprived group).

1N∑i=1NF^cancer(t|Z=1,xi)−1N∑i=1NF^cancer(t|Z=0,xi)

1N∑i=1NF^other(t|Z=1,xi)−1N∑i=1NF^other(t|Z=0,xi)

I specify this in standsurv using the at1() and at2() options to estimate the marginal predictions for the least and most deprived group respectively.

I also use the at1() and at2() within the expsurv() function to ensure that the correct expected mortality rates are used in each calculation. I then use the contrast() option to calculate the absolute difference in the marginal crude probabilities between the deprivation groups.

. standsurv, crudeprob               ///  Crude probabilities of death
>            timevar(t5)             ///  Time points used for predictions
>            frame(f2, replace)      ///  frame to save results to
>            at1(dep 1)              ///  Least deprived
>            at2(dep 5)              ///  Most deprived
>            atvar(dep_1 dep_5)      ///  New variables containing the predictions
>            contrast(difference)    ///  Calculate the difference between groups
>            contrastvar(diff_dep)   ///  New variables containing the difference 
>            ci                      ///  Calculate confidence intervals
>            expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
>              agediag(agediag)      ///  Age at diagnosis in the dataset
>              datediag(datediag)    ///  Date of diagnosis in the dataset
>              pmage(age)            ///  Age variable in the popmort file
>              pmyear(year)          ///  Year variable in the popmort file                    
>              pmother(dep sex)      ///  Other variables included in the popmort file
>              pmrate(rate)          ///  Rate variable in the popmort file
>              pmmaxage(100)         ///  Maximum age in the popmort file
>              pmmaxyear(2016)       ///  Maximum year in the popmort file
>              at1(dep 1)            ///  Use expected rates for least deprived
>              at2(dep 5)            ///  Use expected rates for most deprived
>              )  

We can also plot the differences in the marginal probabilities.

. frame f2 {
.   twoway (rarea diff_dep_disease_lci diff_dep_disease_uci t5, col(%30)) ///
>          (line diff_dep_disease t5, pstyle(p1line)),                    ///
>          legend(off)                                                    ///
>          xtitle("Years since Diagnosis")                                ///
>          ytitle("Difference in Probability of Death")                   ///
>          ylabel(0(0.01)0.1, format(%3.2f))                              ///
>          title("Cancer")                                                ///
>          name(diff_cancer,replace)
.          
.   twoway (rarea diff_dep_other_lci diff_dep_other_uci t5, col(%30))     ///
>          (line diff_dep_other t5, pstyle(p1line)),                      ///
>          legend(off)                                                    ///
>          xtitle("Years since Diagnosis")                                ///
>          ytitle("Difference in Probability of Death")                   ///
>          ylabel(0(0.01)0.1, format(%3.2f))                              ///
>          title("Other Causes")                                          ///
>          name(diff_other,replace)
. 
.   graph combine diff_cancer diff_other, ycommon
. }

Here we can see that the probability of death due to cancer and the probability of death due to other causes are both greater for the most deprived group.

Specific covariate patterns

Although using marginal measures can be useful to summarise the mortality of a group of individuals, we may instead be interested in more personalised predictions for an individual with a particular covariate pattern.

If we are interested in making predictions for a particular individual in the dataset we could do this by using the predict command and specify the values of the covariates we want to predict for.

. predict CPc, crudeprob                ///  Crude probabilities of death
>         timevar(0 5,step(0.1))        ///  Time points used for predictions
>         frame(CP, replace)            /// frame to save predictions
>         at1(agediag 85 stage 2 female 0 dep 1) /// covariate values 
>         expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
>                 agediag(85)           ///  Age at diagnosis in the dataset
>                 datediag(2011-1-1)    ///  Date of diagnosis in the dataset
>                 pmage(age)            ///  Age variable in the popmort file
>                 pmyear(year)          ///  Year variable in the popmort file                       
>                 pmother(dep sex)      ///  Other variables included in the popmort file
>                 pmrate(rate)          ///  Rate variable in the popmort file
>                 pmmaxage(100)         ///  Maximum age in the popmort file
>                 pmmaxyear(2016)       ///  Maximum year in the popmort file
>                 expvar(CPo)           ///
>                 at1(dep 1 sex 2)      ///
>         )             
Predictions are stored in frame - CP
. 
. frame CP {     
.   gen allcause = CPc + CPo
.   gen alive = 1
. 
.   twoway (area alive tt, col(%30))                                              ///
>          (area allcause tt,   col(%30))                                         ///
>          (area CPo tt, col(%30)),                                               ///
>          xtitle("Years since Diagnosis")                                        ///
>          ytitle("Probability")                                                  ///
>          title("Male, Stage 2, Deprivation Group 1, Diagnosed Aged 85 in 2011", ///
>                 size(*0.8))                                                     ///
>          ylabel(,format(%3.1f))                                                 ///
>          legend(order(1 "Alive"                                                 ///
>                       2 "Death from Cancer"                                     ///
>                       3 "Death from Other Causes")                              ///
>                       rows(1) ring(1) pos(6)) 
.     
. }

Effect of age at diagnosis

I now show how you can calculate the crude probabilities of death for individuals with a particular covariate pattern which may not exist in the dataset. Here I make predictions for the following covariate pattern: male, from the least deprived group, diagnosed on 1st January 2011 with a stage 2 tumour and aged either 60, 70, 80 or 90.

This is done in a loop. It could be done with 4 separate at options, but a loop may be preferable with many predictions. Note that I use the mergecreate as a suboption of the frame option. This creates the frame if it does not exist, and otherwise merges with the existing frame.

. foreach age in 60 70 80 90 {             
  2.   predict CPc`age',  crudeprob       ///  Crude probabilities of death
>           at1(agediag `age' female 0 dep 1 stage 2) ///  Specify covariate pattern
>           frame(CP_age, mergecreate) /// frame to save results
>           timevar(0 5, step(0.1))    ///  Time points used for predictions
>           expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
>             agediag(`age')          ///  Age at diagnosis variable
>             datediag(2011-1-1)      ///  Temporary date of diagnosis variable
>             pmage(age)              ///  Age variable in the popmort file
>             pmyear(year)            ///  Year variable in the popmort file
>             pmother(dep sex)        ///  Other variables included in the popmort file
>             pmrate(rate)            ///  Rate variable in the popmort file
>             pmmaxage(100)           ///  Maximum age in the popmort file
>             pmmaxyear(2016)         ///  Maximum year in the popmort file
>             at1(dep 1 sex 1)        ///  Use expected rates for least deprived and male
>             expvar(CPo`age')        ///  crude probability (other causes)
>           ) 
  3. }
Predictions are stored in frame - CP_age
Predictions are stored in frame - CP_age
Predictions are stored in frame - CP_age
Predictions are stored in frame - CP_age

A panel plot can be produced using a loop over the 4 selected ages.

// All-cause probability of death

. frame CP_age: gen alive = 1

. frame CP_age {    
.   foreach age in 60 70 80 90 {           
  2.     gen allcause`age' = CPc`age' + CPo`age'
  3.     twoway (area alive tt, col(%30))                    ///
>            (area allcause`age' tt, color(%30))          ///
>            (area CPo`age' tt, color(%30)),              ///
>            ylabel(,format(%3.1f))                       ///
>            legend(order(1 "Alive"                       ///
>                         2 "Death from Cancer"           ///
>                         3 "Death from Other Causes")    ///
>                         rows(1) ring(1) pos(6))         ///
>                         xtitle("Years since Diagnosis") ///
>                         ytitle("Probability")           ///
>                         title("Age `age'")              ///
>            name(age`age',replace) 
  4.   }
. }

. grc1leg age60 age70 age80 age90, title("Male, Stage 2, Deprivation Group 1, Diagnosed in 2011")

Here we see that age at diagnosis has a large impact on the all-cause probability of death. These graphs also show how the all-cause probability of death breaks down into the probability of dying from each cause. For example, a 90 year old with this covariate pattern is much more likely to die from other causes than their cancer. Understanding the most likely cause of death is important as this can help with making treatment decisions.

Effect of stage at diagnosis

Now we look at the effect of stage at diagnosis on the risk predictions of a woman diagnosed aged 75 on 1st January 2011 from the most deprived group. We could also loop over stage here, but I will use 4 at options as an alternative to predictions for selected ages above.

. predict CPc1 CPc2 CPc3 CPc4, crudeprob frame(CP_stage, replace)   ///
>              at1(agediag 75 female 1 dep 5 stage 1)  ///  Stage 1
>              at2(agediag 75 female 1 dep 5 stage 2)  ///  Stage 1
>              at3(agediag 75 female 1 dep 5 stage 3)  ///  Stage 1
>              at4(agediag 75 female 1 dep 5 stage 4)  ///  Stage 1
>              timevar(0 5, step(0.1))             ///  Time points
>              expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
>              agediag(75)                         ///  
>              datediag(2011-1-1)                 ///  
>              pmage(age)                          ///  Age variable in the popmort file
>              pmyear(year)                        ///  Year variable in the popmort file                    
>              pmother(dep sex)                    ///  Other variables in the popmort file
>              pmrate(rate)                        ///  Rate variable in the popmort file
>              pmmaxage(100)                       ///  Maximum age in the popmort file
>              pmmaxyear(2016)                     ///  Maximum year in the popmort file
>              at1(dep 5 sex 2)                    ///  Use expected rates 
>              at2(dep 5 sex 2)                    ///  Use expected rates 
>              at3(dep 5 sex 2)                    ///  Use expected rates 
>              at4(dep 5 sex 2)                    ///  Use expected rates 
>              expvar(CPo1 CPo2 CPo3 CPo4)         /// crud probs (other causes)
>              ) 
Predictions are stored in frame - CP_stage
. frame CP_stage: gen alive = 1

. frame CP_stage {
.   forvalues stage = 1/4 {
  2.     gen allcause`stage' = CPc`stage' + CPo`stage'
  3.     twoway (area alive tt, color(%30))               ///
>            (area allcause`stage' tt, color(%30))   ///
>            (area CPo`stage' tt, color(%30)), ///
>             ylabel(,format(%3.1f))                   ///
>             legend(order(1 "Alive"                   ///
>                          2 "Death from Cancer"         ///
>                          3 "Death from Other Causes") ///
>                          rows(1) ring(1) pos(6))       ///
>                          xtitle("Years since Diagnosis") ///
>                          ytitle("Probability") ///
>                          title("Stage `stage'") ///
>                          name(stage`stage',replace)     
  4.   }
. }

. grc1leg stage1 stage2 stage3 stage4, title("Female, Age 75, Deprivation Group 5, Diagnosed in 2011") 

Here we can see that if an individual with this covariate pattern is diagnosed with an early stage tumour, the all-cause probability of death is very low and mostly due to other causes. In contrast, for an individual diagnosed with advanced stage cancer, the all-cause probability of death by 5 years is very high and cancer is the most likely cause of death.

References

Syriopoulou, E.; Rutherford, M. J. & Lambert P. C. Understanding disparities in cancer prognosis: An extension of mediation analysis to the relative survival framework. Biometrical Journal 2021; 63(2): 341-353

Lambert, P. C.; Dickman, P. W.; Nelson, C. P.; Royston P. Estimating the crude probability of death due to cancer and other causes using relative survival models. Statistics in Medicine 2010; 29(7-8): 885-895