Standardized Crude Probabilities of Death

By Sarah Booth (sarah.booth@le.ac.uk)

Download Stata Do file here

You will need to install standsurv to run the example. Details here

Background

The standardized relative survival tutorial introduced the concept of relative survival and illustrated how flexible parametric survival models can be fitted in this framework. Under certain conditions, relative survival can be interpreted as net survival, the survival in a hypothetical world where it is not possible to die from other causes. This measure is often used to make fair comparisons of cancer survival between different countries or populations as only the excess mortality related to the cancer diagnosis is analysed and differences in other cause mortality are ignored.

However, measures of survival in the “real world” can be more useful for clinical decision making as they consider the competing risk of dying from causes other than cancer. This tutorial demonstrates how standsurv can be used to estimate the probabilities of dying from cancer and other causes after flexible parametric survival models are fitted in the relative survival setting. In the relative survival framework, these are known as crude probabilities of death, whereas in a cause-specific setting, they are referred to as cause-specific cumulative incidence functions (CIF). Further details on how to calculate these measures in the cause-specific setting can be found here.

Methods

$F_{cancer}(t|x_i)$ denotes the probability of death due to cancer and can be calculated using the following equation where the relative survival function $R(u|x_{1i})$ and the excess hazard due to cancer $\lambda(u|x_{1i})$ can both be obtained from the relative survival model. $S^*(u|x_{2i})$ is the expected survival of a similar group of people in the general population without cancer and can be obtained from the population life tables (also known as a “popmort” file). The life tables used in this particular example correspond to the expected survival in England and are stratified by calendar year, age, sex and deprivation group.

$x_{1}$ is a subset of $x$ and includes the covariates relating to the excess mortality such as age at diagnosis, sex, deprivation group and stage at diagnosis. $x_{2}$ is a different subset of $x$ and contains the factors that the life tables are stratified by, which in this particular example, are age, calendar year, sex and deprivation group.

$$ F_{cancer}(t|x_i) = \int_0^t S(u|x_i) \lambda(u|x_{1i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) \lambda(u|x_{1i}) du $$

$F_{other}(t|x_i)$ is the probability of dying from causes other than cancer, where $h^*(t|x_{2i})$ is the expected hazard function and can be obtained from the population life tables.

$$ F_{other}(t|x_i) = \int_0^t S(u|x_i) h^*(u|x_{2i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) h^*(u|x_{2i}) du $$

The crude probabilities of death due to each cause sum to the all cause probability of death.

$$ F_{all cause}(t|x_i) = F_{cancer}(t|x_i) + F_{other}(t|x_i) $$

Example (simulated colon cancer data)

Prepare data

This tutorial uses simulated data from a paper by Syriopoulou et al. It is based on colon cancer survival in England and is restricted to only include the most and least deprived quintile of the population.

This dataset contains the following variables: ID number (id), age at diagnosis (agediag, 16-104), stage of tumour at diagnosis (stage, stages 1-4), year of diagnosis (yeardiag, 2011-2013), month of diagnosis (diagmonth), date of diagnosis (datediag), sex (sex, 0 = Male, 1 = Female), survival time in years (t, 0.0027 - 10), survival status (dead, 0 = Alive, 1 = Dead) and deprivation quintile (dep, 1 = Least deprived, 5 = Most Deprived).

To prepare the data, I first format the date of diagnosis variable and restrict the analysis to individuals who were diagnosed with colon cancer between the ages of 18 and 99. I also need to recode the variable relating to sex as currently, 0 = Male and 1 = Female, whereas in the life tables (popmort file), 1 = Male and 2 = Female. Recoding this variable means that the life tables will be correctly merged in.

. use https://www.pclambert.net/data/colonsim_stage, clear

. // Format datediag to display as a date
. format datediag %td

. // Restrict analysis to patients aged 18-99 at diagnosis
. keep if agediag>=18 & agediag<=99
(3 observations deleted)

. // Recode the sex variable to match the popmort file
. replace sex = sex+1
(15,627 real changes made)

. label define label_sex 1 "Male" 2 "Female" 

. label values sex label_sex 

stset can then be used to calculate the survival time of each of the 15,627 individuals and to censor any individuals who were still alive 5 years after their diagnosis.

. stset t, failure(dead=1) id(id) exit(time 5)

Survival-time data settings

           ID variable: id
         Failure event: dead==1
Observed time interval: (t[_n-1], t]
     Exit on or before: time 5

--------------------------------------------------------------------------
     15,627  total observations
          0  exclusions
--------------------------------------------------------------------------
     15,627  observations remaining, representing
     15,627  subjects
      7,927  failures in single-failure-per-subject data
 51,109.303  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =         5

In order to fit a relative survival model, the expected mortality rate of each individual at their event time is required. To identify the correct expected mortality rates, I first calculate the attained age (age of the individual at their event time) and attained year (calendar year when the event or censoring occurs). I name these variables age and year to match the variable names in the life tables. As the maximum age included in the life tables is 100, I force the maximum attained age to be set as 100. Similarly, as the life tables only go up to 2016, I also make the maximum attained year to be 2016. This makes the assumption that the expected rates in 2018 for each combination of age, sex and deprivation group are the same as they were in 2016. The expected mortality rates can then be merged in by matching for attained age, attained year, sex and deprivation quintile.

. // Attained age
. gen age = min(floor(agediag + _t),100)

. // Attained calendar year
. gen year = min(floor(yeardiag + _t),2016)

. // Merge in life tables
. merge m:1 age year dep sex using https://www.pclambert.net/data/popmort_uk_2017, ///
> keep(match master) keepusing(rate)

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                            15,627  (_merge==3)
    -----------------------------------------

Next, I create any variables I’ll need to fit the model. To allow the effect of age to be non-linear, I use a restricted cubic spline function with 3 degrees of freedom (further information on spline functions can be found here). I also save the knot locations and orthogonalization matrix so that I can produce predictions for different ages later on.

I also create dummy variables relating to stage of tumour, being female, being in the most deprived group and interaction terms between stage and deprivation group.

. // Non-linear function for age
. rcsgen agediag, gen(agercs) df(3) orthog 
Variables agercs1 to agercs3 were created

. global ageknots `r(knots)'

. matrix Rage =r(R)

. // Dummy variables
. tab stage, gen(stage)

      stage |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      2,253       14.42       14.42
          2 |      4,578       29.30       43.71
          3 |      4,196       26.85       70.56
          4 |      4,600       29.44      100.00
------------+-----------------------------------
      Total |     15,627      100.00

. gen female = sex == 2 

. gen dep5 = dep == 5      

. // Interaction terms between stage and deprivation
. forvalues i = 2/4 {
  2.         gen stage`i'dep5 = stage`i'*dep5
  3. }

Fitting the model

Now I fit the flexible parametric survival model used by Syriopoulou et al (2021) that includes age at diagnosis, sex, deprivation group and stage at diagnosis as covariates, along with interaction terms between stage and deprivation group. It also includes time-dependent effects for the main effects of deprivation group and stage. It uses 5 degrees of freedom for the baseline and 3 degrees of freedom to model the time-dependent effects.

. stpm2 agercs* female dep5 stage2 stage3 stage4 stage?dep5, scale(hazard) df(5) ///
> tvc(agercs* dep5 stage2 stage3 stage4) dftvc(3) bhazard(rate)

Iteration 0:   log likelihood = -15708.776  
Iteration 1:   log likelihood = -15261.461  
Iteration 2:   log likelihood =  -15218.85  
Iteration 3:   log likelihood = -15216.994  
Iteration 4:   log likelihood = -15216.932  
Iteration 5:   log likelihood = -15216.931  

Log likelihood = -15216.931                             Number of obs = 15,627

---------------------------------------------------------------------------------
                | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
----------------+----------------------------------------------------------------
xb              |
        agercs1 |   .2496067    .015655    15.94   0.000     .2189234      .28029
        agercs2 |  -.0678942   .0158733    -4.28   0.000    -.0990053   -.0367832
        agercs3 |   .0009404   .0154283     0.06   0.951    -.0292984    .0311793
         female |   .0544283   .0272993     1.99   0.046     .0009227    .1079339
           dep5 |  -.3857208   .5211361    -0.74   0.459    -1.407129    .6356871
         stage2 |   1.010531   .2681356     3.77   0.000      .484995    1.536067
         stage3 |   2.420313   .2559239     9.46   0.000     1.918711    2.921915
         stage4 |    4.04397   .2538504    15.93   0.000     3.546433    4.541508
     stage2dep5 |    .520696   .5385144     0.97   0.334    -.5347727    1.576165
     stage3dep5 |   .6461979   .5240845     1.23   0.218    -.3809888    1.673385
     stage4dep5 |   .4935665   .5221478     0.95   0.345    -.5298243    1.516957
          _rcs1 |   1.224971   .2473644     4.95   0.000      .740146    1.709797
          _rcs2 |    .270678   .1817024     1.49   0.136    -.0854521    .6268081
          _rcs3 |   .0411339   .0588815     0.70   0.485    -.0742717    .1565394
          _rcs4 |   .0220194   .0262256     0.84   0.401    -.0293819    .0734206
          _rcs5 |   .0055588   .0025953     2.14   0.032     .0004721    .0106455
  _rcs_agercs11 |  -.1252562   .0150861    -8.30   0.000    -.1548244    -.095688
  _rcs_agercs12 |   .0453818   .0114159     3.98   0.000      .023007    .0677565
  _rcs_agercs13 |   .0092465   .0056925     1.62   0.104    -.0019106    .0204036
  _rcs_agercs21 |   .0332775   .0158652     2.10   0.036     .0021823    .0643728
  _rcs_agercs22 |  -.0263001   .0122339    -2.15   0.032    -.0502782    -.002322
  _rcs_agercs23 |  -.0090217   .0058512    -1.54   0.123    -.0204898    .0024463
  _rcs_agercs31 |  -.0069708   .0128924    -0.54   0.589    -.0322394    .0182977
  _rcs_agercs32 |  -.0161214   .0098502    -1.64   0.102    -.0354275    .0031847
  _rcs_agercs33 |   .0026671   .0054263     0.49   0.623    -.0079682    .0133024
     _rcs_dep51 |  -.1221882   .0251041    -4.87   0.000    -.1713913   -.0729851
     _rcs_dep52 |   .0302529   .0191343     1.58   0.114    -.0072496    .0677554
     _rcs_dep53 |   .0195214   .0104123     1.87   0.061    -.0008863     .039929
   _rcs_stage21 |  -.2166611    .255327    -0.85   0.396    -.7170929    .2837707
   _rcs_stage22 |  -.0187113   .1863861    -0.10   0.920    -.3840214    .3465989
   _rcs_stage23 |  -.0581805   .0704035    -0.83   0.409    -.1961689    .0798078
   _rcs_stage31 |   .1168377   .2502913     0.47   0.641    -.3737242    .6073996
   _rcs_stage32 |  -.3432881   .1821072    -1.89   0.059    -.7002118    .0136355
   _rcs_stage33 |  -.0514838   .0691152    -0.74   0.456     -.186947    .0839795
   _rcs_stage41 |  -.1312033   .2478076    -0.53   0.596    -.6168973    .3544906
   _rcs_stage42 |  -.1540583   .1799866    -0.86   0.392    -.5068256     .198709
   _rcs_stage43 |  -.0630003   .0676757    -0.93   0.352    -.1956422    .0696416
          _cons |  -3.983653   .2530327   -15.74   0.000    -4.479588   -3.487718
---------------------------------------------------------------------------------

Marginal crude probabilities

Now that the relative survival model is fitted, the crude probabilities of death can be estimated. The marginal crude probabilities of death are an average measure that can be used to summarise the mortality of the $N$ individuals used to develop the model. Each individual’s predicted probability of death from each cause is calculated at a particular time point and then averaged.

$$ \widehat{F}{M,cancer}(t) = \frac{1}{N} \sum{i=1}^{N} {\widehat{F}_{cancer}(t|x_i)} $$

$$ \widehat{F}{M,other}(t) = \frac{1}{N} \sum{i=1}^{N} {\widehat{F}_{other}(t|x_i)} $$

In standsurv this can be achieved by specifying the crudeprob option. Including at1(.) means that the predictions for each individual will be made based on their observed covariate values in the dataset. Using the timevar() option means that the marginal crude probabilities will be calculated at a particular time point or a series of time points. Here t5 allows these probabilities to be estimated at 51 time points so that a smooth curve across the 5 year follow-up period can be produced.

As this calculation requires the expected mortality rates, the population life tables need to be specified using expsurv() which links the age of the individuals in the dataset to their attained age and calendar year at each time point. As the population life tables are stratified by sex and deprivation these are specified using pmother(). As the maximum age in the life tables is 100 and the maximum year is 2016, I specify these options using pmmaxage() and pmmaxyear() respectively so that any values greater than these will be set to the maximum value. The atvar() option is used to name the variables where the predictions will be stored. By calling this “marg”, by default it will create a variable called marg_disease and another named marg_other to save the crude probabilities of death due to cancer and other causes respectively.

range t5 0 5 51
standsurv, verbose                 ///  Display output 
		   at1(.)                  ///  Use observed covariate values            
           crudeprob               ///  Crude probabilities of death
           timevar(t5)             ///  Time points used for predictions
           atvar(marg)             ///  New variable containing the predictions
           expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
             agediag(agediag)      ///  Age at diagnosis in the dataset
             datediag(datediag)    ///  Date of diagnosis in the dataset
             pmage(age)            ///  Age variable in the popmort file
             pmyear(year)          ///  Year variable in the popmort file                     
             pmother(dep sex)      ///  Other variables included in the popmort file
             pmrate(rate)          ///  Rate variable in the popmort file
			 pmmaxage(100)		   ///  Maximum age in the popmort file
			 pmmaxyear(2016)       ///  Maximum year in the popmort file
             ) 

Here we can see that within this group of people, the probability of dying from cancer is over 3 times larger than dying from other causes by 5 years after diagnosis. An alternative way to present these predictions is to produce a stacked graph by adding the crude probabilities of death from each cause together. The blue line then indicates the all-cause probability of death and each coloured region indicates the probability of being alive, dying from cancer and dying from other causes.

Standardization using contrasts

The contrast() option can be used to investigate differences between subgroups in the population. As shown in the standardized relative survival tutorial, we might be interested in the effect of deprivation.

We cannot fairly compare these groups using their observed covariate values since this would mean we would be averaging over different covariate patterns within each deprivation group. For example, in the most deprived group there is a greater proportion of individuals diagnosed with Stage 4 tumours so we wouldn’t know whether the differences in the marginal crude probabilities of death were due to the effect of deprivation or other factors such as stage.

To account for this we can first make predictions for all individuals by supposing that they are all in the most deprived group (i.e. setting dep5=1 for everyone regardless of their true deprivation group but keeping the observed values of all other covariates in the dataset). We can then make a second set of predictions where all individuals are assumed to be in the least deprived group (dep5=0). This is called standardization and further examples can be found in the standardized relative survival tutorial.

This approach allows us to investigate the differences in the marginal crude probabilities of death for this hypothetical population. Mathematically, it can be written as the following where $Z$ is the binary covariate dep5 ($Z=1$ is the most deprived group and $Z=0$ is the least deprived group).

$$ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=0,x_i)} $$

$$ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=0,x_i)} $$

I specify this in standsurv using the at1() and at2() options to estimate the marginal predictions for the least and most deprived group respectively. As there are interactions between deprivation group and stage, these also need to be specified. In the least deprived group they can all be set to 0 as dep5 = 0 and for the most deprived group, we can set them equal to the stage variable.

I also use the at1() and at2() within the expsurv() function to ensure that the correct expected mortality rates are used in each calculation. I then use the contrast option to calculate the absolute difference in the marginal crude probabilities between the deprivation groups.

standsurv, verbose                 ///  Display output
		   at1(dep5 0 stage2dep5 0 stage3dep5 0 stage4dep5 0 ) ///  Least deprived
		   at2(dep5 1 stage2dep5=stage2 stage3dep5=stage3 stage4dep5=stage4) ///  Most deprived
		   atvar(dep_1 dep_5)      ///  New variables containing the predictions
		   contrast(difference)    ///  Calculate the difference between groups
		   contrastvar(diff_dep)   ///  New variables containing the difference 
		   ci                      ///  Calculate confidence intervals
           crudeprob               ///  Crude probabilities of death
           timevar(t5)             ///  Time points used for predictions
           expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
             agediag(agediag)      ///  Age at diagnosis in the dataset
             datediag(datediag)    ///  Date of diagnosis in the dataset
             pmage(age)            ///  Age variable in the popmort file
             pmyear(year)          ///  Year variable in the popmort file                    
             pmother(dep sex)      ///  Other variables included in the popmort file
             pmrate(rate)          ///  Rate variable in the popmort file
			 pmmaxage(100)		   ///  Maximum age in the popmort file
			 pmmaxyear(2016)       ///  Maximum year in the popmort file
			 at1(dep 1)            ///  Use expected rates for least deprived
			 at2(dep 5)            ///  Use expected rates for most deprived
             ) 

Here we can see that the probability of death due to cancer and the probability of death due to other causes are both greater for the most deprived group.

Specific covariate patterns

Particular individual in the dataset

Although using marginal measures can be useful to summarise the mortality of a group of individuals, we may instead be interested in more personalised predictions for an individual with a particular covariate pattern.

If we are interested in making predictions for a particular individual in the dataset we could do this by using an if statement to restrict the calculation to this one person. The individual whose ID number is 2510 was diagnosed aged 85 in 2011 with a stage 2 tumour, is male and in the least deprived group.

standsurv if id==2510,         ///  Only include Patient 2510
	   verbose                 ///  Display output
	   at1(.)                  ///  Use observed covariate values
	   crudeprob               ///  Crude probabilities of death
	   timevar(t5)             ///  Time points used for predictions
	   atvar(id2510)           ///  New variable containing the predictions
	   expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
		 agediag(agediag)      ///  Age at diagnosis in the dataset
		 datediag(datediag)    ///  Date of diagnosis in the dataset
		 pmage(age)            ///  Age variable in the popmort file
		 pmyear(year)          ///  Year variable in the popmort file                       
		 pmother(dep sex)      ///  Other variables included in the popmort file
		 pmrate(rate)          ///  Rate variable in the popmort file
		 pmmaxage(100)		   ///  Maximum age in the popmort file
		 pmmaxyear(2016)       ///  Maximum year in the popmort file
	   ) 

Effect of age at diagnosis

I now show how you can calculate the crude probabilities of death for individuals with a particular covariate pattern which may not exist in the dataset. Here I make predictions for the following covariate pattern: male, from the least deprived group, diagnosed on 1st January 2011 with a stage 2 tumour and aged either 60, 70, 80 or 90. I use rcsgen to determine the value that each of the restricted cubic spline function parameters take for a given age.

To make sure that the correct expected hazard rates are merged in, I create temporary variables for age at diagnosis and date of diagnosis and I specify sex and deprivation group within expsurv() using at1(). As I am just interested in one particular covariate pattern I use if _n==1 to speed up the calculation as this means that standsurv will only use the first row. If this were not specified, the same result would be obtained but it would mean that the same prediction would be made for every row of the dataset and then averaged.

gen temp_agediag = .                 
gen temp_datediag = mdy(1,1,2011)    
foreach age in 60 70 80 90 {		 
	replace temp_agediag = `age'     
	rcsgen, scalar(`age') knots($ageknots) rmatrix(Rage) gen(c) 
	standsurv if _n==1, verbose      ///  Display output   
		   at1(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' ///  Specify covariate pattern
		   female 0 dep5 0 stage2 1 stage3 0 stage4 0 ///
		   stage2dep5 0 stage3dep5 0 stage4dep5 0) ///  
           crudeprob                 ///  Crude probabilities of death
           timevar(t5)               ///  Time points used for predictions
           atvar(age`age')           ///  New variable containing the predictions
           expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
             agediag(temp_agediag)   ///  Temporary age at diagnosis variable
             datediag(temp_datediag) ///  Temporary date of diagnosis variable
             pmage(age)              ///  Age variable in the popmort file
             pmyear(year)            ///  Year variable in the popmort file                    
             pmother(dep sex)        ///  Other variables included in the popmort file
             pmrate(rate)            ///  Rate variable in the popmort file
			 pmmaxage(100)		     ///  Maximum age in the popmort file
			 pmmaxyear(2016)         ///  Maximum year in the popmort file
			 at1(dep 1 sex 1)        ///  Use expected rates for least deprived and male
             ) 
			 
    // All-cause probability of death    
	gen all_age`age' = age`age'_disease + age`age'_other
	twoway(area alive t5, sort col("120 172 68") fintensity(30)) ///
	(area all_age`age' t5, sort 	 fintensity(30) col("3 144 214")) ///
	(area age`age'_other t5, sort fintensity(30) col("254 48 11") ///
	legend(order(1 "Alive" 2 "Death from Cancer" 3 "Death from Other Causes") ///
	rows(1) ring(1) pos(6))xtitle("Years since Diagnosis") ytitle("Probability") ///
	title("Age `age'") ylabel(,format(%3.1f)) name(age`age',replace)) 	 
}

Here we see that age at diagnosis has a large impact on the all-cause probability of death. These graphs also show how the all-cause probability of death breaks down into the probability of dying from each cause. For example, a 90 year old with this covariate pattern is much more likely to die from other causes than their cancer. Understanding the most likely cause of death is important as this can help with making treatment decisions.

Effect of stage at diagnosis

Now we look at the effect of stage at diagnosis on the risk predictions of a woman diagnosed aged 75 on 1st January 2011 from the most deprived group. As this individual is in the most deprived group, not all of the interactions between stage and deprivation will be 0 like in the previous example.

replace temp_agediag = 75
replace temp_datediag = mdy(1,1,2011)
rcsgen, scalar(75) knots($ageknots) rmatrix(Rage) gen(c)
standsurv if _n==1, verbose ///
   at1(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 ///  Stage 1
   stage2 0 stage3 0 stage4 0 stage2dep5 0 stage3dep5 0 stage4dep5 0) ///
   at2(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 ///  Stage 2
   stage2 1 stage3 0 stage4 0 stage2dep5 1 stage3dep5 0 stage4dep5 0) ///
   at3(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 ///  Stage 3
   stage2 0 stage3 1 stage4 0 stage2dep5 0 stage3dep5 1 stage4dep5 0) ///
   at4(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 ///  Stage 4
   stage2 0 stage3 0 stage4 1 stage2dep5 0 stage3dep5 0 stage4dep5 1) ///
   crudeprob                  ///  Crude probabilities of death
   timevar(t5)                ///  Time points used for predictions
   atvar(stage_1 stage_2 stage_3 stage_4) ///  New variables containing the predictions
   expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) ///  Popmort file
	 agediag(temp_agediag)    ///  Temporary age at diagnosis variable
	 datediag(temp_datediag)  ///  Temporary date of diagnosis variable
	 pmage(age)               ///  Age variable in the popmort file
	 pmyear(year)             ///  Year variable in the popmort file                    
	 pmother(dep sex)         ///  Other variables included in the popmort file
	 pmrate(rate)             ///  Rate variable in the popmort file
	 pmmaxage(100)		      ///  Maximum age in the popmort file
	 pmmaxyear(2016)          ///  Maximum year in the popmort file
	 at1(dep 5 sex 2)         ///  Use expected rates for most deprived and female
	 at2(dep 5 sex 2)         ///  Use expected rates for most deprived and female
	 at3(dep 5 sex 2)         ///  Use expected rates for most deprived and female
	 at4(dep 5 sex 2)         ///  Use expected rates for most deprived and female
	 ) 
	 
forvalues stage = 1/4 {
	gen all_stage_`stage' = stage_`stage'_disease + stage_`stage'_other	
}

Here we can see that if an individual with this covariate pattern is diagnosed with an early stage tumour, the all-cause probability of death is very low and mostly due to other causes. In contrast, for an individual diagnosed with advanced stage cancer, the all-cause probability of death by 5 years is very high and cancer is the most likely cause of death.

References

Syriopoulou, E.; Rutherford, M. J. & Lambert P. C. Understanding disparities in cancer prognosis: An extension of mediation analysis to the relative survival framework. Biometrical Journal 2021; 63(2): 341-353

Lambert, P. C.; Dickman, P. W.; Nelson, C. P.; Royston P. Estimating the crude probability of death due to cancer and other causes using relative survival models. Statistics in Medicine 2010; 29(7-8): 885-895

Professor of Biostatistics