Centiles of Standardized Survival Functions
Background
In a previous tutorial I used standsurv
to obtain standardized survival functions. In this tutorial I show the first of a
number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.
As a reminder a centile of a survival function can be obtained by solving $S(t) = \alpha$ for $t$. For example, for the median survival time we set $\alpha = 0.5$, i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for $t$ analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2
I have used Brent’s root finder when evaluating centiles.
The centile of a standardized survival function is obtained by solving the following equation for t.
$$ E\left(S(t | X=x,Z\right) = \alpha $$
This is done through root finding (using Brent’s root finder) by solving,
$$ \frac{1}{N}\sum_{i=1}^N {S(t | X=x,Z)} - \alpha = 0 $$
Variances can be obtained using M-estimation .
Example
I use a colon cancer example. I first load and stset
the data
. //use https://www.pclambert.net/data/colon, clear
. use c:/cansurv/data/colon, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)
. drop if stage==0
(2,356 observations deleted)
. stset surv_mm, f(status=1,2) scale(12) exit(time 120)
failure event: status == 1 2
obs. time interval: (0, surv_mm]
exit on or before: time 120
t for analysis: time/12
------------------------------------------------------------------------------
13,208 total observations
0 exclusions
------------------------------------------------------------------------------
13,208 observations remaining, representing
8,866 failures in single-record/single-failure data
43,950.667 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 10
I drop those with missing stage information (stage == 0
). I am investigating all cause survival (status=1,2
).
I fit a model that includes stage, sex and age (using a rectricted cubic splines). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv
would be identical. Stage is classified as localised, regional and distant and is modelled using two dummy covariates with localised as the reference category.
. tab stage, gen(stage)
Clinical |
stage at |
diagnosis | Freq. Percent Cum.
------------+-----------------------------------
Localised | 6,274 47.50 47.50
Regional | 1,787 13.53 61.03
Distant | 5,147 38.97 100.00
------------+-----------------------------------
Total | 13,208 100.00
. gen female = sex==2
. rcsgen age, df(3) gen(agercs) center(60)
Variables agercs1 to agercs3 were created
. stpm2 stage2 stage3 female agercs*, scale(hazard) df(4) nolog eform
Log likelihood = -19665.932 Number of obs = 13,208
------------------------------------------------------------------------------
| exp(b) Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb |
stage2 | 1.758926 .0617187 16.09 0.000 1.642026 1.884149
stage3 | 5.656322 .1390555 70.48 0.000 5.39024 5.935539
female | .8634681 .01891 -6.70 0.000 .8271894 .901338
agercs1 | .9958187 .0053355 -0.78 0.434 .9854161 1.006331
agercs2 | 1.000015 .0000127 1.22 0.223 .9999906 1.00004
agercs3 | .9999654 .0000155 -2.24 0.025 .9999351 .9999957
_rcs1 | 3.484444 .0392205 110.90 0.000 3.408415 3.562169
_rcs2 | 1.1908 .0101285 20.53 0.000 1.171113 1.210817
_rcs3 | .9620673 .0050498 -7.37 0.000 .9522207 .9720158
_rcs4 | 1.015331 .003491 4.43 0.000 1.008512 1.022197
_cons | .1615451 .0046341 -63.55 0.000 .1527132 .1708879
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.
I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.
. range tt 0 10 100
(13,108 missing values generated)
. standsurv, at1(female 0) at2(female 1) timevar(tt) atvar(ms_male ms_female) ci
. twoway (line ms_male ms_female tt, sort) ///
> , yline(0.5, lpattern(dash) lcolor(black)) ///
> yline(0.5, lpattern(dash) lcolor(black)) ///
> xtitle("Years since diagnosis") ///
> ytitle("S(t)", angle(h)) ///
> ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
> legend(order(1 "Male" 2 "Female") ring(0) pos(1) cols(1))
The graph of the two standardised survival functions can be seen below.
As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at $S(t)=0.5$. Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile
option of standsurv
will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast
option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.
. standsurv, at1(female 0) at2(female 1) centile(50) ///
> atvar(med_male med_female) contrast(difference) ci
. list med_male* in 1, ab(15)
+-----------------------------------------+
| med_male med_male_lci med_male_uci |
|-----------------------------------------|
1. | 1.9801987 1.8875488 2.0773963 |
+-----------------------------------------+
. list med_female* in 1, ab(15)
+----------------------------------------------+
| med_female med_female_lci med_female_uci |
|----------------------------------------------|
1. | 2.4249751 2.3197847 2.5349353 |
+----------------------------------------------+
. list _contrast* in 1, ab(18)
+----------------------------------------------------+
| _contrast2_1 _contrast2_1_lci _contrast2_1_uci |
|----------------------------------------------------|
1. | .44477636 .31389716 .57565556 |
+----------------------------------------------------+
The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast
option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).
It is possible to predict for multiple centiles by passing a numlist to the centiles
option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.
. standsurv, at1(female 0) at2(female 1) centile(10(10)60) ///
> atvar(cen_males cen_females) contrast(difference) ci ///
> centvar(centiles) contrastvar(cendiff)
. list centile cen_males cen_females cendiff in 1/6, sep(0) noobs
+----------------------------------------------+
| centiles cen_males cen_fem~s cendiff |
|----------------------------------------------|
| 10 .14919254 .16940399 .02021145 |
| 20 .3236508 .38459929 .06094849 |
| 30 .63821705 .78393263 .14571558 |
| 40 1.1648032 1.4192444 .25444123 |
| 50 1.9801987 2.4249751 .44477636 |
| 60 3.4239223 4.277573 .85365069 |
+----------------------------------------------+
We can then plot the difference in these various centiles.
. twoway (rarea cendiff_lci cendiff_uci centile, sort color(red%30)) ///
> (line cendiff centile, color(black)) ///
> , xtitle(centile) xlabel(,format(%3.0f)) ///
> ytitle("Difference in centile") ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
> legend(off)
There are probably more innovative ways of presenting such data.
Acknowledgement
I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David has written a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.
References
Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38