# Standardized Crude Probabilities of Death

### By Sarah Booth (sarah.booth@le.ac.uk)

You will need to install

`standsurv`

to run the example. Details here

## Background

The standardized relative survival tutorial introduced the concept of relative survival and illustrated how flexible parametric survival models can be fitted in this framework. Under certain conditions, relative survival can be interpreted as net survival, the survival in a hypothetical world where it is not possible to die from other causes. This measure is often used to make fair comparisons of cancer survival between different countries or populations as only the excess mortality related to the cancer diagnosis is analysed and differences in other cause mortality are ignored.

However, measures of survival in the “real world” can be more useful for clinical decision making as they consider the competing risk of dying from causes other than cancer. This tutorial demonstrates how `standsurv`

can be used to estimate the probabilities of dying from cancer and other causes after flexible parametric survival models are fitted in the relative survival setting. In the relative survival framework, these are known as crude probabilities of death, whereas in a cause-specific setting, they are referred to as cause-specific cumulative incidence functions (CIF). Further details on how to calculate these measures in the cause-specific setting can be found here.

## Methods

$F_{cancer}(t|x_i)$ denotes the probability of death due to cancer and can be calculated using the following equation where the relative survival function $R(u|x_{1i})$ and the excess hazard due to cancer $\lambda(u|x_{1i})$ can both be obtained from the relative survival model. $S^*(u|x_{2i})$ is the expected survival of a similar group of people in the general population without cancer and can be obtained from the population life tables (also known as a “popmort” file). The life tables used in this particular example correspond to the expected survival in England and are stratified by calendar year, age, sex and deprivation group.

$x_{1}$ is a subset of $x$ and includes the covariates relating to the excess mortality such as age at diagnosis, sex, deprivation group and stage at diagnosis. $x_{2}$ is a different subset of $x$ and contains the factors that the life tables are stratified by, which in this particular example, are age, calendar year, sex and deprivation group.

$$ F_{cancer}(t|x_i) = \int_0^t S(u|x_i) \lambda(u|x_{1i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) \lambda(u|x_{1i}) du $$

$F_{other}(t|x_i)$ is the probability of dying from causes other than cancer, where $h^*(t|x_{2i})$ is the expected hazard function and can be obtained from the population life tables.

$$ F_{other}(t|x_i) = \int_0^t S(u|x_i) h^*(u|x_{2i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) h^*(u|x_{2i}) du $$

The crude probabilities of death due to each cause sum to the all cause probability of death.

$$ F_{all cause}(t|x_i) = F_{cancer}(t|x_i) + F_{other}(t|x_i) $$

## Example (simulated colon cancer data)

### Prepare data

This tutorial uses simulated data from a paper by Syriopoulou et al. It is based on colon cancer survival in England and is restricted to only include the most and least deprived quintile of the population.

This dataset contains the following variables: ID number (`id`

), age at diagnosis (`agediag`

, 16-104), stage of tumour at diagnosis (`stage`

, stages 1-4), year of diagnosis (`yeardiag`

, 2011-2013), month of diagnosis (`diagmonth`

), date of diagnosis (`datediag`

), sex (`sex`

, 0 = Male, 1 = Female), survival time in years (`t`

, 0.0027 - 10), survival status (`dead`

, 0 = Alive, 1 = Dead) and deprivation quintile (`dep`

, 1 = Least deprived, 5 = Most Deprived).

To prepare the data, I first format the date of diagnosis variable and restrict the analysis to individuals who were diagnosed with colon cancer between the ages of 18 and 99. I also need to recode the variable relating to sex as currently, 0 = Male and 1 = Female, whereas in the life tables (`popmort`

file), 1 = Male and 2 = Female. Recoding this variable means that the life tables will be correctly merged in.

```
. use https://www.pclambert.net/data/colonsim_stage, clear
. // Format datediag to display as a date
. format datediag %td
. // Restrict analysis to patients aged 18-99 at diagnosis
. keep if agediag>=18 & agediag<=99
(3 observations deleted)
. // Recode the sex variable to match the popmort file
. replace sex = sex+1
(15,627 real changes made)
. label define label_sex 1 "Male" 2 "Female"
. label values sex label_sex
```

`stset`

can then be used to calculate the survival time of each of the 15,627 individuals and to censor any individuals who were still alive 5 years after their diagnosis.

```
. stset t, failure(dead=1) id(id) exit(time 5)
Survival-time data settings
ID variable: id
Failure event: dead==1
Observed time interval: (t[_n-1], t]
Exit on or before: time 5
--------------------------------------------------------------------------
15,627 total observations
0 exclusions
--------------------------------------------------------------------------
15,627 observations remaining, representing
15,627 subjects
7,927 failures in single-failure-per-subject data
51,109.303 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 5
```

In order to fit a relative survival model, the expected mortality rate of each individual at their event time is required. To identify the correct expected mortality rates, I first calculate the attained age (age of the individual at their event time) and attained year (calendar year when the event or censoring occurs). I name these variables `age`

and `year`

to match the variable names in the life tables. As the maximum age included in the life tables is 100, I force the maximum attained age to be set as 100. Similarly, as the life tables only go up to 2016, I also make the maximum attained year to be 2016. This makes the assumption that the expected rates in 2018 for each combination of age, sex and deprivation group are the same as they were in 2016. The expected mortality rates can then be merged in by matching for attained age, attained year, sex and deprivation quintile.

```
. // Attained age
. gen age = min(floor(agediag + _t),100)
. // Attained calendar year
. gen year = min(floor(yeardiag + _t),2016)
. // Merge in life tables
. merge m:1 age year dep sex using https://www.pclambert.net/data/popmort_uk_2017, ///
> keep(match master) keepusing(rate)
Result Number of obs
-----------------------------------------
Not matched 0
Matched 15,627 (_merge==3)
-----------------------------------------
```

Next, I create any variables I’ll need to fit the model. To allow the effect of age to be non-linear, I use a restricted cubic spline function with 3 degrees of freedom (further information on spline functions can be found here). I also save the knot locations and orthogonalization matrix so that I can produce predictions for different ages later on.

I also create dummy variables relating to stage of tumour, being female, being in the most deprived group and interaction terms between stage and deprivation group.

```
. // Non-linear function for age
. rcsgen agediag, gen(agercs) df(3) orthog
Variables agercs1 to agercs3 were created
. global ageknots `r(knots)'
. matrix Rage =r(R)
. // Dummy variables
. tab stage, gen(stage)
stage | Freq. Percent Cum.
------------+-----------------------------------
1 | 2,253 14.42 14.42
2 | 4,578 29.30 43.71
3 | 4,196 26.85 70.56
4 | 4,600 29.44 100.00
------------+-----------------------------------
Total | 15,627 100.00
. gen female = sex == 2
. gen dep5 = dep == 5
. // Interaction terms between stage and deprivation
. forvalues i = 2/4 {
2. gen stage`i'dep5 = stage`i'*dep5
3. }
```

### Fitting the model

Now I fit the flexible parametric survival model used by Syriopoulou et al (2021) that includes age at diagnosis, sex, deprivation group and stage at diagnosis as covariates, along with interaction terms between stage and deprivation group. It also includes time-dependent effects for the main effects of deprivation group and stage. It uses 5 degrees of freedom for the baseline and 3 degrees of freedom to model the time-dependent effects.

```
. stpm2 agercs* female dep5 stage2 stage3 stage4 stage?dep5, scale(hazard) df(5) ///
> tvc(agercs* dep5 stage2 stage3 stage4) dftvc(3) bhazard(rate)
Iteration 0: log likelihood = -15708.776
Iteration 1: log likelihood = -15261.461
Iteration 2: log likelihood = -15218.85
Iteration 3: log likelihood = -15216.994
Iteration 4: log likelihood = -15216.932
Iteration 5: log likelihood = -15216.931
Log likelihood = -15216.931 Number of obs = 15,627
---------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
----------------+----------------------------------------------------------------
xb |
agercs1 | .2496067 .015655 15.94 0.000 .2189234 .28029
agercs2 | -.0678942 .0158733 -4.28 0.000 -.0990053 -.0367832
agercs3 | .0009404 .0154283 0.06 0.951 -.0292984 .0311793
female | .0544283 .0272993 1.99 0.046 .0009227 .1079339
dep5 | -.3857208 .5211361 -0.74 0.459 -1.407129 .6356871
stage2 | 1.010531 .2681356 3.77 0.000 .484995 1.536067
stage3 | 2.420313 .2559239 9.46 0.000 1.918711 2.921915
stage4 | 4.04397 .2538504 15.93 0.000 3.546433 4.541508
stage2dep5 | .520696 .5385144 0.97 0.334 -.5347727 1.576165
stage3dep5 | .6461979 .5240845 1.23 0.218 -.3809888 1.673385
stage4dep5 | .4935665 .5221478 0.95 0.345 -.5298243 1.516957
_rcs1 | 1.224971 .2473644 4.95 0.000 .740146 1.709797
_rcs2 | .270678 .1817024 1.49 0.136 -.0854521 .6268081
_rcs3 | .0411339 .0588815 0.70 0.485 -.0742717 .1565394
_rcs4 | .0220194 .0262256 0.84 0.401 -.0293819 .0734206
_rcs5 | .0055588 .0025953 2.14 0.032 .0004721 .0106455
_rcs_agercs11 | -.1252562 .0150861 -8.30 0.000 -.1548244 -.095688
_rcs_agercs12 | .0453818 .0114159 3.98 0.000 .023007 .0677565
_rcs_agercs13 | .0092465 .0056925 1.62 0.104 -.0019106 .0204036
_rcs_agercs21 | .0332775 .0158652 2.10 0.036 .0021823 .0643728
_rcs_agercs22 | -.0263001 .0122339 -2.15 0.032 -.0502782 -.002322
_rcs_agercs23 | -.0090217 .0058512 -1.54 0.123 -.0204898 .0024463
_rcs_agercs31 | -.0069708 .0128924 -0.54 0.589 -.0322394 .0182977
_rcs_agercs32 | -.0161214 .0098502 -1.64 0.102 -.0354275 .0031847
_rcs_agercs33 | .0026671 .0054263 0.49 0.623 -.0079682 .0133024
_rcs_dep51 | -.1221882 .0251041 -4.87 0.000 -.1713913 -.0729851
_rcs_dep52 | .0302529 .0191343 1.58 0.114 -.0072496 .0677554
_rcs_dep53 | .0195214 .0104123 1.87 0.061 -.0008863 .039929
_rcs_stage21 | -.2166611 .255327 -0.85 0.396 -.7170929 .2837707
_rcs_stage22 | -.0187113 .1863861 -0.10 0.920 -.3840214 .3465989
_rcs_stage23 | -.0581805 .0704035 -0.83 0.409 -.1961689 .0798078
_rcs_stage31 | .1168377 .2502913 0.47 0.641 -.3737242 .6073996
_rcs_stage32 | -.3432881 .1821072 -1.89 0.059 -.7002118 .0136355
_rcs_stage33 | -.0514838 .0691152 -0.74 0.456 -.186947 .0839795
_rcs_stage41 | -.1312033 .2478076 -0.53 0.596 -.6168973 .3544906
_rcs_stage42 | -.1540583 .1799866 -0.86 0.392 -.5068256 .198709
_rcs_stage43 | -.0630003 .0676757 -0.93 0.352 -.1956422 .0696416
_cons | -3.983653 .2530327 -15.74 0.000 -4.479588 -3.487718
---------------------------------------------------------------------------------
```

### Marginal crude probabilities

Now that the relative survival model is fitted, the crude probabilities of death can be estimated. The marginal crude probabilities of death are an average measure that can be used to summarise the mortality of the $N$ individuals used to develop the model. Each individual’s predicted probability of death from each cause is calculated at a particular time point and then averaged.

$$ \widehat{F}*{M,cancer}(t) = \frac{1}{N} \sum*{i=1}^{N} {\widehat{F}_{cancer}(t|x_i)} $$

$$ \widehat{F}*{M,other}(t) = \frac{1}{N} \sum*{i=1}^{N} {\widehat{F}_{other}(t|x_i)} $$

In `standsurv`

this can be achieved by specifying the `crudeprob`

option. Including `at1(.)`

means that the predictions for each individual will be made based on their observed covariate values in the dataset. Using the `timevar()`

option means that the marginal crude probabilities will be calculated at a particular time point or a series of time points. Here `t5`

allows these probabilities to be estimated at 51 time points so that a smooth curve across the 5 year follow-up period can be produced.

As this calculation requires the expected mortality rates, the population life tables need to be specified using `expsurv()`

which links the age of the individuals in the dataset to their attained age and calendar year at each time point. As the population life tables are stratified by sex and deprivation these are specified using `pmother()`

. As the maximum age in the life tables is 100 and the maximum year is 2016, I specify these options using `pmmaxage()`

and `pmmaxyear()`

respectively so that any values greater than these will be set to the maximum value. The `atvar()`

option is used to name the variables where the predictions will be stored. By calling this “marg”, by default it will create a variable called `marg_disease`

and another named `marg_other`

to save the crude probabilities of death due to cancer and other causes respectively.

```
range t5 0 5 51
standsurv, verbose /// Display output
at1(.) /// Use observed covariate values
crudeprob /// Crude probabilities of death
timevar(t5) /// Time points used for predictions
atvar(marg) /// New variable containing the predictions
expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
agediag(agediag) /// Age at diagnosis in the dataset
datediag(datediag) /// Date of diagnosis in the dataset
pmage(age) /// Age variable in the popmort file
pmyear(year) /// Year variable in the popmort file
pmother(dep sex) /// Other variables included in the popmort file
pmrate(rate) /// Rate variable in the popmort file
pmmaxage(100) /// Maximum age in the popmort file
pmmaxyear(2016) /// Maximum year in the popmort file
)
```

Here we can see that within this group of people, the probability of dying from cancer is over 3 times larger than dying from other causes by 5 years after diagnosis. An alternative way to present these predictions is to produce a stacked graph by adding the crude probabilities of death from each cause together. The blue line then indicates the all-cause probability of death and each coloured region indicates the probability of being alive, dying from cancer and dying from other causes.

### Standardization using contrasts

The `contrast()`

option can be used to investigate differences between subgroups in the population. As shown in the standardized relative survival tutorial, we might be interested in the effect of deprivation.

We cannot fairly compare these groups using their observed covariate values since this would mean we would be averaging over different covariate patterns within each deprivation group. For example, in the most deprived group there is a greater proportion of individuals diagnosed with Stage 4 tumours so we wouldn’t know whether the differences in the marginal crude probabilities of death were due to the effect of deprivation or other factors such as stage.

To account for this we can first make predictions for all individuals by supposing that they are all in the most deprived group (i.e. setting `dep5=1`

for everyone regardless of their true deprivation group but keeping the observed values of all other covariates in the dataset). We can then make a second set of predictions where all individuals are assumed to be in the least deprived group (`dep5=0`

). This is called standardization and further examples can be found in the standardized relative survival tutorial.

This approach allows us to investigate the differences in the marginal crude probabilities of death for this hypothetical population. Mathematically, it can be written as the following where $Z$ is the binary covariate `dep5`

($Z=1$ is the most deprived group and $Z=0$ is the least deprived group).

$$ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=0,x_i)} $$

$$ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=0,x_i)} $$

I specify this in `standsurv`

using the `at1()`

and `at2()`

options to estimate the marginal predictions for the least and most deprived group respectively. As there are interactions between deprivation group and stage, these also need to be specified. In the least deprived group they can all be set to 0 as `dep5 = 0`

and for the most deprived group, we can set them equal to the stage variable.

I also use the `at1()`

and `at2()`

within the `expsurv()`

function to ensure that the correct expected mortality rates are used in each calculation. I then use the `contrast`

option to calculate the absolute difference in the marginal crude probabilities between the deprivation groups.

```
standsurv, verbose /// Display output
at1(dep5 0 stage2dep5 0 stage3dep5 0 stage4dep5 0 ) /// Least deprived
at2(dep5 1 stage2dep5=stage2 stage3dep5=stage3 stage4dep5=stage4) /// Most deprived
atvar(dep_1 dep_5) /// New variables containing the predictions
contrast(difference) /// Calculate the difference between groups
contrastvar(diff_dep) /// New variables containing the difference
ci /// Calculate confidence intervals
crudeprob /// Crude probabilities of death
timevar(t5) /// Time points used for predictions
expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
agediag(agediag) /// Age at diagnosis in the dataset
datediag(datediag) /// Date of diagnosis in the dataset
pmage(age) /// Age variable in the popmort file
pmyear(year) /// Year variable in the popmort file
pmother(dep sex) /// Other variables included in the popmort file
pmrate(rate) /// Rate variable in the popmort file
pmmaxage(100) /// Maximum age in the popmort file
pmmaxyear(2016) /// Maximum year in the popmort file
at1(dep 1) /// Use expected rates for least deprived
at2(dep 5) /// Use expected rates for most deprived
)
```

Here we can see that the probability of death due to cancer and the probability of death due to other causes are both greater for the most deprived group.

### Specific covariate patterns

#### Particular individual in the dataset

Although using marginal measures can be useful to summarise the mortality of a group of individuals, we may instead be interested in more personalised predictions for an individual with a particular covariate pattern.

If we are interested in making predictions for a particular individual in the dataset we could do this by using an `if`

statement to restrict the calculation to this one person. The individual whose ID number is 2510 was diagnosed aged 85 in 2011 with a stage 2 tumour, is male and in the least deprived group.

```
standsurv if id==2510, /// Only include Patient 2510
verbose /// Display output
at1(.) /// Use observed covariate values
crudeprob /// Crude probabilities of death
timevar(t5) /// Time points used for predictions
atvar(id2510) /// New variable containing the predictions
expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
agediag(agediag) /// Age at diagnosis in the dataset
datediag(datediag) /// Date of diagnosis in the dataset
pmage(age) /// Age variable in the popmort file
pmyear(year) /// Year variable in the popmort file
pmother(dep sex) /// Other variables included in the popmort file
pmrate(rate) /// Rate variable in the popmort file
pmmaxage(100) /// Maximum age in the popmort file
pmmaxyear(2016) /// Maximum year in the popmort file
)
```

#### Effect of age at diagnosis

I now show how you can calculate the crude probabilities of death for individuals with a particular covariate pattern which may not exist in the dataset. Here I make predictions for the following covariate pattern: male, from the least deprived group, diagnosed on 1st January 2011 with a stage 2 tumour and aged either 60, 70, 80 or 90. I use `rcsgen`

to determine the value that each of the restricted cubic spline function parameters take for a given age.

To make sure that the correct expected hazard rates are merged in, I create temporary variables for age at diagnosis and date of diagnosis and I specify sex and deprivation group within `expsurv()`

using `at1()`

. As I am just interested in one particular covariate pattern I use `if _n==1`

to speed up the calculation as this means that `standsurv`

will only use the first row. If this were not specified, the same result would be obtained but it would mean that the same prediction would be made for every row of the dataset and then averaged.

```
gen temp_agediag = .
gen temp_datediag = mdy(1,1,2011)
foreach age in 60 70 80 90 {
replace temp_agediag = `age'
rcsgen, scalar(`age') knots($ageknots) rmatrix(Rage) gen(c)
standsurv if _n==1, verbose /// Display output
at1(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' /// Specify covariate pattern
female 0 dep5 0 stage2 1 stage3 0 stage4 0 ///
stage2dep5 0 stage3dep5 0 stage4dep5 0) ///
crudeprob /// Crude probabilities of death
timevar(t5) /// Time points used for predictions
atvar(age`age') /// New variable containing the predictions
expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
agediag(temp_agediag) /// Temporary age at diagnosis variable
datediag(temp_datediag) /// Temporary date of diagnosis variable
pmage(age) /// Age variable in the popmort file
pmyear(year) /// Year variable in the popmort file
pmother(dep sex) /// Other variables included in the popmort file
pmrate(rate) /// Rate variable in the popmort file
pmmaxage(100) /// Maximum age in the popmort file
pmmaxyear(2016) /// Maximum year in the popmort file
at1(dep 1 sex 1) /// Use expected rates for least deprived and male
)
// All-cause probability of death
gen all_age`age' = age`age'_disease + age`age'_other
twoway(area alive t5, sort col("120 172 68") fintensity(30)) ///
(area all_age`age' t5, sort fintensity(30) col("3 144 214")) ///
(area age`age'_other t5, sort fintensity(30) col("254 48 11") ///
legend(order(1 "Alive" 2 "Death from Cancer" 3 "Death from Other Causes") ///
rows(1) ring(1) pos(6))xtitle("Years since Diagnosis") ytitle("Probability") ///
title("Age `age'") ylabel(,format(%3.1f)) name(age`age',replace))
}
```

Here we see that age at diagnosis has a large impact on the all-cause probability of death. These graphs also show how the all-cause probability of death breaks down into the probability of dying from each cause. For example, a 90 year old with this covariate pattern is much more likely to die from other causes than their cancer. Understanding the most likely cause of death is important as this can help with making treatment decisions.

#### Effect of stage at diagnosis

Now we look at the effect of stage at diagnosis on the risk predictions of a woman diagnosed aged 75 on 1st January 2011 from the most deprived group. As this individual is in the most deprived group, not all of the interactions between stage and deprivation will be 0 like in the previous example.

```
replace temp_agediag = 75
replace temp_datediag = mdy(1,1,2011)
rcsgen, scalar(75) knots($ageknots) rmatrix(Rage) gen(c)
standsurv if _n==1, verbose ///
at1(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 /// Stage 1
stage2 0 stage3 0 stage4 0 stage2dep5 0 stage3dep5 0 stage4dep5 0) ///
at2(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 /// Stage 2
stage2 1 stage3 0 stage4 0 stage2dep5 1 stage3dep5 0 stage4dep5 0) ///
at3(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 /// Stage 3
stage2 0 stage3 1 stage4 0 stage2dep5 0 stage3dep5 1 stage4dep5 0) ///
at4(agercs1 `=c1' agercs2 `=c2' agercs3 `=c3' female 1 dep5 1 /// Stage 4
stage2 0 stage3 0 stage4 1 stage2dep5 0 stage3dep5 0 stage4dep5 1) ///
crudeprob /// Crude probabilities of death
timevar(t5) /// Time points used for predictions
atvar(stage_1 stage_2 stage_3 stage_4) /// New variables containing the predictions
expsurv(using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
agediag(temp_agediag) /// Temporary age at diagnosis variable
datediag(temp_datediag) /// Temporary date of diagnosis variable
pmage(age) /// Age variable in the popmort file
pmyear(year) /// Year variable in the popmort file
pmother(dep sex) /// Other variables included in the popmort file
pmrate(rate) /// Rate variable in the popmort file
pmmaxage(100) /// Maximum age in the popmort file
pmmaxyear(2016) /// Maximum year in the popmort file
at1(dep 5 sex 2) /// Use expected rates for most deprived and female
at2(dep 5 sex 2) /// Use expected rates for most deprived and female
at3(dep 5 sex 2) /// Use expected rates for most deprived and female
at4(dep 5 sex 2) /// Use expected rates for most deprived and female
)
forvalues stage = 1/4 {
gen all_stage_`stage' = stage_`stage'_disease + stage_`stage'_other
}
```

Here we can see that if an individual with this covariate pattern is diagnosed with an early stage tumour, the all-cause probability of death is very low and mostly due to other causes. In contrast, for an individual diagnosed with advanced stage cancer, the all-cause probability of death by 5 years is very high and cancer is the most likely cause of death.

## References

Syriopoulou, E.; Rutherford, M. J. & Lambert P. C. Understanding disparities in cancer prognosis: An extension of mediation analysis to the relative survival framework. *Biometrical Journal* 2021; **63**(2): 341-353

Lambert, P. C.; Dickman, P. W.; Nelson, C. P.; Royston P. Estimating the crude probability of death due to cancer and other causes using relative survival models. *Statistics in Medicine* 2010; **29**(7-8): 885-895