Standardized Crude Probabilities of Death
By Sarah Booth (sarah.booth@le.ac.uk)
You will need to install
standsurv
to run the example. Details here
Background
The standardized relative survival tutorial introduced the concept of relative survival and illustrated how flexible parametric survival models can be fitted in this framework. Under certain conditions, relative survival can be interpreted as net survival, the survival in a hypothetical world where it is not possible to die from other causes. This measure is often used to make fair comparisons of cancer survival between different countries or populations as only the excess mortality related to the cancer diagnosis is analysed and differences in other cause mortality are ignored.
However, measures of survival in the “real world” can be more useful for clinical decision making as they consider the competing risk of dying from causes other than cancer. This tutorial demonstrates how standsurv
can be used to estimate the probabilities of dying from cancer and other causes after flexible parametric survival models are fitted in the relative survival setting. In the relative survival framework, these are known as crude probabilities of death, whereas in a cause-specific setting, they are referred to as cause-specific cumulative incidence functions (CIF). Further details on how to calculate these measures in the cause-specific setting can be found here.
Methods
\(F_{cancer}(t|x_i)\) denotes the probability of death due to cancer and can be calculated using the following equation where the relative survival function \(R(u|x_{1i})\) and the excess hazard due to cancer \(\lambda(u|x_{1i})\) can both be obtained from the relative survival model. \(S^*(u|x_{2i})\) is the expected survival of a similar group of people in the general population without cancer and can be obtained from the population life tables (also known as a “popmort” file). The life tables used in this particular example correspond to the expected survival in England and are stratified by calendar year, age, sex and deprivation group.
\(x_{1}\) is a subset of \(x\) and includes the covariates relating to the excess mortality such as age at diagnosis, sex, deprivation group and stage at diagnosis. \(x_{2}\) is a different subset of \(x\) and contains the factors that the life tables are stratified by, which in this particular example, are age, calendar year, sex and deprivation group.
\[ F_{cancer}(t|x_i) = \int_0^t S(u|x_i) \lambda(u|x_{1i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) \lambda(u|x_{1i}) du \]
\(F_{other}(t|x_i)\) is the probability of dying from causes other than cancer, where \(h^*(t|x_{2i})\) is the expected hazard function and can be obtained from the population life tables.
\[ F_{other}(t|x_i) = \int_0^t S(u|x_i) h^*(u|x_{2i}) du = \int_0^t S^*(u|x_{2i}) R(u|x_{1i}) h^*(u|x_{2i}) du \]
The crude probabilities of death due to each cause sum to the all cause probability of death.
\[ F_{all cause}(t|x_i) = F_{cancer}(t|x_i) + F_{other}(t|x_i) \]
Example (simulated colon cancer data)
Prepare data
This tutorial uses simulated data from a paper by Syriopoulou et al. It is based on colon cancer survival in England and is restricted to only include the most and least deprived quintile of the population.
This dataset contains the following variables: ID number (id
), age at diagnosis (agediag
, 16-104), stage of tumour at diagnosis (stage
, stages 1-4), year of diagnosis (yeardiag
, 2011-2013), month of diagnosis (diagmonth
), date of diagnosis (datediag
), sex (sex
, 0 = Male, 1 = Female), survival time in years (t
, 0.0027 - 10), survival status (dead
, 0 = Alive, 1 = Dead) and deprivation quintile (dep
, 1 = Least deprived, 5 = Most Deprived).
As computation is slow I have taken a 20% sample of the original data, which leads to 3,061. individuals.
To prepare the data, I first format the date of diagnosis variable and restrict the analysis to individuals who were diagnosed with colon cancer between the ages of 18 and 99. I also need to recode the variable relating to sex as currently, 0 = Male and 1 = Female, whereas in the life tables (popmort
file), 1 = Male and 2 = Female. Recoding this variable means that the life tables will be correctly merged in.
set seed 128763
.
use https://www.pclambert.net/data/colonsim_stage if runiform()<0.2, clear
.
// Format datediag to display as a date
. format datediag %td
.
// Restrict analysis to patients aged 18-99 at diagnosis
. keep if agediag>=18 & agediag<=99
.
(0 observations deleted)
// Recode the sex variable to match the popmort file
. replace sex = sex+1
. real changes made)
(3,061
label define label_sex 1 "Male" 2 "Female"
.
label values sex label_sex
.
gen female = sex==2 .
stset
can then be used to calculate the survival time of each of the 15,627 individuals and to censor any individuals who were still alive 5 years after their diagnosis.
stset t, failure(dead=1) id(id) exit(time 5)
.
data settings
Survival-time
variable: id
ID
Failure event: dead==1_n-1], t]
Observed time interval: (t[on or before: time 5
Exit
--------------------------------------------------------------------------total observations
3,061
0 exclusions
--------------------------------------------------------------------------
3,061 observations remaining, representing
3,061 subjectsin single-failure-per-subject data
1,581 failures total analysis time at risk and under observation
9,926.831
At risk from t = 0
Earliest observed entry t = 0exit t = 5 Last observed
In order to fit a relative survival model, the expected mortality rate of each individual at their event time is required. To identify the correct expected mortality rates, I first calculate the attained age (age of the individual at their event time) and attained year (calendar year when the event or censoring occurs). I name these variables age
and year
to match the variable names in the life tables. As the maximum age included in the life tables is 100, I force the maximum attained age to be set as 100. Similarly, as the life tables only go up to 2016, I also make the maximum attained year to be 2016. This makes the assumption that the expected rates in 2018 for each combination of age, sex and deprivation group are the same as they were in 2016. The expected mortality rates can then be merged in by matching for attained age, attained year, sex and deprivation quintile.
// Attained age
. gen age = min(floor(agediag + _t),100)
.
// Attained calendar year
. gen year = min(floor(yeardiag + _t),2016)
.
// Merge in life tables
. merge m:1 age year dep sex using https://www.pclambert.net/data/popmort_uk_2017, ///
. keep(match master) keepusing(rate)
>
of obs
Result Number
-----------------------------------------
Not matched 0_merge==3)
Matched 3,061 (
-----------------------------------------
drop age year .
I drop the attained age (age
) and attained year (year
) variables now that the expected rates have been merged in.
Previously, when using stpm2
various variables needed to be created. This included restricted cubic spline variables for the effect of age, dummy variables for stage and various interactions. For the spline variables, the knot positions and projection matrix for orthogonalization needed to be save. Using stpm3
makes things much easier, as all these can be included in the commands itself.
Fitting the model
Now I fit the flexible parametric survival model used by Syriopoulou et al (2021) that includes age at diagnosis (using natural splines), sex, deprivation group and stage at diagnosis as covariates, along with interaction terms between stage and deprivation group. It also includes time-dependent effects for the main effects of deprivation group, stage and age (using natural splines). It uses 5 degrees of freedom for the baseline and 3 degrees of freedom to model the time-dependent effects.
scale(lncumhazard) df(5) ///
. stpm3 i.female i.dep i.stage i.stage#dep @ns(agediag,df(3)),
> tvc(i.dep i.stage @ns(agediag,df(3))) dftvc(3) bhazard(rate) vsquish
Iteration 0: Log likelihood = -3104.5329
Iteration 1: Log likelihood = -3019.487
Iteration 2: Log likelihood = -3008.7321
Iteration 3: Log likelihood = -3007.9984
Iteration 4: Log likelihood = -3007.8399
Iteration 5: Log likelihood = -3007.7991
Iteration 6: Log likelihood = -3007.7894
Iteration 7: Log likelihood = -3007.7867
Iteration 8: Log likelihood = -3007.7859
Iteration 9: Log likelihood = -3007.7857
Iteration 10: Log likelihood = -3007.7856
Iteration 11: Log likelihood = -3007.7856
of obs = 3,061
Number chi2(11) = 541.45
Wald chi2 = 0.0000
Log likelihood = -3007.7856 Prob >
----------------------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-----------------------------+----------------------------------------------------------------xb |
1.female | .0258002 .0611239 0.42 0.673 -.0940004 .1456009
5.dep | 11.33406 610.7999 0.02 0.985 -1185.812 1208.48
stage |
2 | 11.77162 610.8012 0.02 0.985 -1185.377 1208.92
3 | 13.63744 610.8011 0.02 0.982 -1183.511 1210.786
4 | 14.87865 610.8011 0.02 0.981 -1182.27 1212.027
stage#dep |
2 5 | -11.34975 610.7999 -0.02 0.985 -1208.496 1185.796
3 5 | -11.19332 610.7999 -0.02 0.985 -1208.339 1185.952
4 5 | -11.26091 610.7999 -0.02 0.985 -1208.407 1185.885
_ns_f1_agediag1 | .8547894 1.699503 0.50 0.615 -2.476176 4.185755
_ns_f1_agediag2 | -1.255521 .6266172 -2.00 0.045 -2.483668 -.0273739
_ns_f1_agediag3 | .1428389 .6831001 0.21 0.834 -1.196013 1.481691
-----------------------------+----------------------------------------------------------------
time |
_ns1 | -46.59876 30.73218 -1.52 0.129 -106.8327 13.63522
_ns2 | 18.43374 12.93551 1.43 0.154 -6.919398 43.78689
_ns3 | -.1336661 1.018261 -0.13 0.896 -2.129421 1.862089
_ns4 | .2966349 .6233048 0.48 0.634 -.9250201 1.51829
_ns5 | .3442934 .4569246 0.75 0.451 -.5512623 1.239849
dep#c._ns_tvc1 |
5 | -1.05234 1.780841 -0.59 0.555 -4.542724 2.438045
dep#c._ns_tvc2 |
5 | 1.22833 1.00034 1.23 0.219 -.7323 3.188959
dep#c._ns_tvc3 |
5 | .3265529 .1202818 2.71 0.007 .090805 .5623008
stage#c._ns_tvc1 |
2 | 15.54094 31.12577 0.50 0.618 -45.46444 76.54632
3 | 20.29633 30.75615 0.66 0.509 -39.98462 80.57727
4 | 22.62941 30.67119 0.74 0.461 -37.48501 82.74383
stage#c._ns_tvc2 |
2 | -4.520576 15.57263 -0.29 0.772 -35.04238 26.00122
3 | -9.567492 15.3728 -0.62 0.534 -39.69763 20.56265
4 | -9.400435 15.31143 -0.61 0.539 -39.41028 20.60941
stage#c._ns_tvc3 |
2 | -.3308294 .7373107 -0.45 0.654 -1.775932 1.114273
3 | -1.137763 .7047464 -1.61 0.106 -2.519041 .2435142
4 | -.448099 .6956624 -0.64 0.519 -1.811572 .9153742
c._ns_f1_agediag1#c._ns_tvc1 | 20.12228 34.46616 0.58 0.559 -47.43015 87.6747
c._ns_f1_agediag1#c._ns_tvc2 | -30.2422 19.20078 -1.58 0.115 -67.87503 7.390637
c._ns_f1_agediag1#c._ns_tvc3 | -4.893479 2.803528 -1.75 0.081 -10.38829 .6013342
c._ns_f1_agediag2#c._ns_tvc1 | -6.719063 18.2678 -0.37 0.713 -42.52329 29.08517
c._ns_f1_agediag2#c._ns_tvc2 | 7.470501 10.16327 0.74 0.462 -12.44915 27.39015
c._ns_f1_agediag2#c._ns_tvc3 | -.0054526 1.219612 -0.00 0.996 -2.395848 2.384942
c._ns_f1_agediag3#c._ns_tvc1 | 9.465698 9.075787 1.04 0.297 -8.322518 27.25391
c._ns_f1_agediag3#c._ns_tvc2 | -10.82306 5.100466 -2.12 0.034 -20.81979 -.8263326
c._ns_f1_agediag3#c._ns_tvc3 | -1.067077 .9595113 -1.11 0.266 -2.947684 .8135306_cons | -14.02936 610.8006 -0.02 0.982 -1211.176 1183.118
----------------------------------------------------------------------------------------------
Extended functions (1) @ns(agediag, df(3))
Marginal crude probabilities
Now that the relative survival model is fitted, the crude probabilities of death can be estimated. The marginal crude probabilities of death are an average measure that can be used to summarise the mortality of the \(N\) individuals used to develop the model. Each individual’s predicted probability of death from each cause is calculated at a particular time point and then averaged.
\[ \widehat{F}_{M,cancer}(t) = \frac{1}{N} \sum{i=1}^{N} {\widehat{F}_{cancer}(t|x_i)} \]
\[ \widehat{F}_{M,other}(t) = \frac{1}{N} \sum{i=1}^{N} {\widehat{F}_{other}(t|x_i)} \]
In standsurv
this can be achieved by specifying the crudeprob
option. Including at1(.)
means that the predictions for each individual will be made based on their observed covariate values in the dataset. Using the timevar()
option means that the marginal crude probabilities will be calculated at a particular time point or a series of time points. Here t5
allows these probabilities to be estimated at 51 time points so that a smooth curve across the 5 year follow-up period can be produced.
As this calculation requires the expected mortality rates, the population life tables need to be specified using expsurv()
which links the age of the individuals in the dataset to their attained age and calendar year at each time point. As the population life tables are stratified by sex and deprivation these are specified using pmother()
. As the maximum age in the life tables is 100 and the maximum year is 2016, I specify these options using pmmaxage()
and pmmaxyear()
respectively so that any values greater than these will be set to the maximum value. The atvar()
option is used to name the variables where the predictions will be stored. By calling this “marg”, by default it will create a variable called marg_disease
and another named marg_other
to save the crude probabilities of death due to cancer and other causes respectively.
range t5 0 5 51
. missing values generated)
(3,010
/// Crude probabilities of death
. standsurv, crudeprob /// Time points used for predictions
> timevar(t5) replace) /// Frame to save results
> frame(f1, /// Use observed covariate values
> at1(.) /// New variable containing the predictions
> atvar(marg) using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
> expsurv(/// Age at diagnosis in the dataset
> agediag(agediag) /// Date of diagnosis in the dataset
> datediag(datediag) /// Age variable in the popmort file
> pmage(age) year) /// Year variable in the popmort file
> pmyear(/// Other variables included in the popmort file
> pmother(dep sex) /// Rate variable in the popmort file
> pmrate(rate) /// Maximum age in the popmort file
> pmmaxage(100) /// Maximum year in the popmort file
> pmmaxyear(2016) > )
We now plot the results.
. frame f1 {twoway (line marg_disease t5), ///
. xtitle("Years since Diagnosis") ///
> ytitle("Probability of Death") ///
> ylabel(,format(%3.1f)) ///
> title("Cancer") ///
> name(marg_disease,replace)
> twoway (line marg_other t5, pstyle(p2line)), ///
. xtitle("Years since Diagnosis") ///
> ytitle("Probability of Death") ///
> ylabel(0(0.1)0.4,format(%3.1f)) ///
> title("Other Causes") ///
> name(marg_other,replace)
>
. graph combine marg_disease marg_other
. . }
Here we can see that within this group of people, the probability of dying from cancer is over 3 times larger than dying from other causes by 5 years after diagnosis. An alternative way to present these predictions is to produce a stacked graph by adding the crude probabilities of death from each cause together. The blue line then indicates the all-cause probability of death and each coloured region indicates the probability of being alive, dying from cancer and dying from other causes.
. frame f1 {gen alive = 1
. gen marg_all = marg_disease + marg_other
. twoway (area alive t5, fintensity(30)) ///
. ///
> (area marg_all t5, fintensity(30)) ///
> (area marg_other t5, fintensity(30)), ytitle("Probability") ///
> xtitle("Time from diagnosis") ///
> ylabel(,format(%3.1f)) ///
> legend(order(1 "Alive" ///
> "Death from Cancer" ///
> 2 "Death from Other Causes") ///
> 3 rows(1) ring(1) pos(6))
> . }
Standardization using contrasts
The contrast()
option can be used to investigate differences between subgroups in the population. As shown in the standardized relative survival tutorial, we might be interested in the effect of deprivation.
We cannot fairly compare these groups using their observed covariate values since this would mean we would be averaging over different covariate patterns within each deprivation group. For example, in the most deprived group there is a greater proportion of individuals diagnosed with Stage 4 tumours so we wouldn’t know whether the differences in the marginal crude probabilities of death were due to the effect of deprivation or other factors such as stage.
To account for this we can first make predictions for all individuals by supposing that they are all in the most deprived group (i.e. setting dep5=1
for everyone regardless of their true deprivation group but keeping the observed values of all other covariates in the dataset). We can then make a second set of predictions where all individuals are assumed to be in the least deprived group (dep5=0
). This is called standardization and further examples can be found in the standardized relative survival tutorial.
This approach allows us to investigate the differences in the marginal crude probabilities of death for this hypothetical population. Mathematically, it can be written as the following where \(Z\) is the binary covariate dep5
(\(Z=1\) is the most deprived group and \(Z=0\) is the least deprived group).
\[ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{cancer}(t|Z=0,x_i)} \]
\[ \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=1,x_i)} - \frac{1}{N}\sum_{i=1}^N{\widehat{F}_{other}(t|Z=0,x_i)} \]
I specify this in standsurv
using the at1()
and at2()
options to estimate the marginal predictions for the least and most deprived group respectively.
I also use the at1()
and at2()
within the expsurv()
function to ensure that the correct expected mortality rates are used in each calculation. I then use the contrast()
option to calculate the absolute difference in the marginal crude probabilities between the deprivation groups.
/// Crude probabilities of death
. standsurv, crudeprob /// Time points used for predictions
> timevar(t5) replace) /// frame to save results to
> frame(f2, /// Least deprived
> at1(dep 1) /// Most deprived
> at2(dep 5) /// New variables containing the predictions
> atvar(dep_1 dep_5) /// Calculate the difference between groups
> contrast(difference) /// New variables containing the difference
> contrastvar(diff_dep) ci /// Calculate confidence intervals
> using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
> expsurv(/// Age at diagnosis in the dataset
> agediag(agediag) /// Date of diagnosis in the dataset
> datediag(datediag) /// Age variable in the popmort file
> pmage(age) year) /// Year variable in the popmort file
> pmyear(/// Other variables included in the popmort file
> pmother(dep sex) /// Rate variable in the popmort file
> pmrate(rate) /// Maximum age in the popmort file
> pmmaxage(100) /// Maximum year in the popmort file
> pmmaxyear(2016) /// Use expected rates for least deprived
> at1(dep 1) /// Use expected rates for most deprived
> at2(dep 5) > )
We can also plot the differences in the marginal probabilities.
. frame f2 {twoway (rarea diff_dep_disease_lci diff_dep_disease_uci t5, col(%30)) ///
. line diff_dep_disease t5, pstyle(p1line)), ///
> (legend(off) ///
> xtitle("Years since Diagnosis") ///
> ytitle("Difference in Probability of Death") ///
> ylabel(0(0.01)0.1, format(%3.2f)) ///
> title("Cancer") ///
> name(diff_cancer,replace)
>
. twoway (rarea diff_dep_other_lci diff_dep_other_uci t5, col(%30)) ///
. line diff_dep_other t5, pstyle(p1line)), ///
> (legend(off) ///
> xtitle("Years since Diagnosis") ///
> ytitle("Difference in Probability of Death") ///
> ylabel(0(0.01)0.1, format(%3.2f)) ///
> title("Other Causes") ///
> name(diff_other,replace)
>
. graph combine diff_cancer diff_other, ycommon
. . }
Here we can see that the probability of death due to cancer and the probability of death due to other causes are both greater for the most deprived group.
Specific covariate patterns
Although using marginal measures can be useful to summarise the mortality of a group of individuals, we may instead be interested in more personalised predictions for an individual with a particular covariate pattern.
If we are interested in making predictions for a particular individual in the dataset we could do this by using the predict
command and specify the values of the covariates we want to predict
for.
predict CPc, crudeprob /// Crude probabilities of death
. /// Time points used for predictions
> timevar(0 5,step(0.1)) replace) /// frame to save predictions
> frame(CP, /// covariate values
> at1(agediag 85 stage 2 female 0 dep 1) using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
> expsurv(/// Age at diagnosis in the dataset
> agediag(85) /// Date of diagnosis in the dataset
> datediag(2011-1-1) /// Age variable in the popmort file
> pmage(age) year) /// Year variable in the popmort file
> pmyear(/// Other variables included in the popmort file
> pmother(dep sex) /// Rate variable in the popmort file
> pmrate(rate) /// Maximum age in the popmort file
> pmmaxage(100) /// Maximum year in the popmort file
> pmmaxyear(2016) ///
> expvar(CPo) ///
> at1(dep 1 sex 2)
> ) in frame - CP Predictions are stored
.
. frame CP { gen allcause = CPc + CPo
. gen alive = 1
.
. twoway (area alive tt, col(%30)) ///
. ///
> (area allcause tt, col(%30)) ///
> (area CPo tt, col(%30)), xtitle("Years since Diagnosis") ///
> ytitle("Probability") ///
> title("Male, Stage 2, Deprivation Group 1, Diagnosed Aged 85 in 2011", ///
> size(*0.8)) ///
> ylabel(,format(%3.1f)) ///
> legend(order(1 "Alive" ///
> "Death from Cancer" ///
> 2 "Death from Other Causes") ///
> 3 rows(1) ring(1) pos(6))
>
. . }
Effect of age at diagnosis
I now show how you can calculate the crude probabilities of death for individuals with a particular covariate pattern which may not exist in the dataset. Here I make predictions for the following covariate pattern: male, from the least deprived group, diagnosed on 1st January 2011 with a stage 2 tumour and aged either 60, 70, 80 or 90.
This is done in a loop. It could be done with 4 separate at
options, but a loop may be preferable with many predictions. Note that I use the mergecreate
as a suboption of the frame
option. This creates the frame if it does not exist, and otherwise merges with the existing frame.
foreach age in 60 70 80 90 {
. predict CPc`age', crudeprob /// Crude probabilities of death
2. `age' female 0 dep 1 stage 2) /// Specify covariate pattern
> at1(agediag /// frame to save results
> frame(CP_age, mergecreate) /// Time points used for predictions
> timevar(0 5, step(0.1)) using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
> expsurv(`age') /// Age at diagnosis variable
> agediag(/// Temporary date of diagnosis variable
> datediag(2011-1-1) /// Age variable in the popmort file
> pmage(age) year) /// Year variable in the popmort file
> pmyear(/// Other variables included in the popmort file
> pmother(dep sex) /// Rate variable in the popmort file
> pmrate(rate) /// Maximum age in the popmort file
> pmmaxage(100) /// Maximum year in the popmort file
> pmmaxyear(2016) /// Use expected rates for least deprived and male
> at1(dep 1 sex 1) `age') /// crude probability (other causes)
> expvar(CPo
> )
3. }in frame - CP_age
Predictions are stored in frame - CP_age
Predictions are stored in frame - CP_age
Predictions are stored in frame - CP_age Predictions are stored
A panel plot can be produced using a loop over the 4 selected ages.
// All-cause probability of death
gen alive = 1
. frame CP_age:
. frame CP_age { foreach age in 60 70 80 90 {
. gen allcause`age' = CPc`age' + CPo`age'
2. twoway (area alive tt, col(%30)) ///
3. `age' tt, color(%30)) ///
> (area allcause`age' tt, color(%30)), ///
> (area CPoylabel(,format(%3.1f)) ///
> legend(order(1 "Alive" ///
> "Death from Cancer" ///
> 2 "Death from Other Causes") ///
> 3 rows(1) ring(1) pos(6)) ///
> xtitle("Years since Diagnosis") ///
> ytitle("Probability") ///
> title("Age `age'") ///
> name(age`age',replace)
>
4. }
. }
title("Male, Stage 2, Deprivation Group 1, Diagnosed in 2011") . grc1leg age60 age70 age80 age90,
Here we see that age at diagnosis has a large impact on the all-cause probability of death. These graphs also show how the all-cause probability of death breaks down into the probability of dying from each cause. For example, a 90 year old with this covariate pattern is much more likely to die from other causes than their cancer. Understanding the most likely cause of death is important as this can help with making treatment decisions.
Effect of stage at diagnosis
Now we look at the effect of stage at diagnosis on the risk predictions of a woman diagnosed aged 75 on 1st January 2011 from the most deprived group. We could also loop over stage here, but I will use 4 at
options as an alternative to predictions for selected ages above.
predict CPc1 CPc2 CPc3 CPc4, crudeprob frame(CP_stage, replace) ///
. /// Stage 1
> at1(agediag 75 female 1 dep 5 stage 1) /// Stage 1
> at2(agediag 75 female 1 dep 5 stage 2) /// Stage 1
> at3(agediag 75 female 1 dep 5 stage 3) /// Stage 1
> at4(agediag 75 female 1 dep 5 stage 4) /// Time points
> timevar(0 5, step(0.1)) using(https://www.pclambert.net/data/popmort_uk_2017) /// Popmort file
> expsurv(///
> agediag(75) ///
> datediag(2011-1-1) /// Age variable in the popmort file
> pmage(age) year) /// Year variable in the popmort file
> pmyear(/// Other variables in the popmort file
> pmother(dep sex) /// Rate variable in the popmort file
> pmrate(rate) /// Maximum age in the popmort file
> pmmaxage(100) /// Maximum year in the popmort file
> pmmaxyear(2016) /// Use expected rates
> at1(dep 5 sex 2) /// Use expected rates
> at2(dep 5 sex 2) /// Use expected rates
> at3(dep 5 sex 2) /// Use expected rates
> at4(dep 5 sex 2) /// crud probs (other causes)
> expvar(CPo1 CPo2 CPo3 CPo4)
> ) in frame - CP_stage Predictions are stored
gen alive = 1
. frame CP_stage:
. frame CP_stage {forvalues stage = 1/4 {
. gen allcause`stage' = CPc`stage' + CPo`stage'
2. twoway (area alive tt, color(%30)) ///
3. `stage' tt, color(%30)) ///
> (area allcause`stage' tt, color(%30)), ///
> (area CPoylabel(,format(%3.1f)) ///
> legend(order(1 "Alive" ///
> "Death from Cancer" ///
> 2 "Death from Other Causes") ///
> 3 rows(1) ring(1) pos(6)) ///
> xtitle("Years since Diagnosis") ///
> ytitle("Probability") ///
> title("Stage `stage'") ///
> name(stage`stage',replace)
>
4. }
. }
title("Female, Age 75, Deprivation Group 5, Diagnosed in 2011") . grc1leg stage1 stage2 stage3 stage4,
Here we can see that if an individual with this covariate pattern is diagnosed with an early stage tumour, the all-cause probability of death is very low and mostly due to other causes. In contrast, for an individual diagnosed with advanced stage cancer, the all-cause probability of death by 5 years is very high and cancer is the most likely cause of death.
References
Syriopoulou, E.; Rutherford, M. J. & Lambert P. C. Understanding disparities in cancer prognosis: An extension of mediation analysis to the relative survival framework. Biometrical Journal 2021; 63(2): 341-353
Lambert, P. C.; Dickman, P. W.; Nelson, C. P.; Royston P. Estimating the crude probability of death due to cancer and other causes using relative survival models. Statistics in Medicine 2010; 29(7-8): 885-895