Centiles of Standardized Survival Functions
Background
In a previous tutorial I used standsurv to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.
As a reminder a centile of a survival function can be obtained by solving \(S(t) = \alpha\) for \(t\). For example, for the median survival time we set \(\alpha = 0.5\), i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for \(t\) analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2 I have used Brent’s root finder when evaluating centiles.
The centile of a standardized survival function is obtained by solving the following equation for t.
\[ E\left(S(t | X=x,Z\right) = \alpha \]
This is done through root finding (using Brent’s root finder) by solving,
\[ \frac{1}{N}\sum_{i=1}^N {S(t | X=x,Z)} - \alpha = 0 \]
Variances can be obtained using M-estimation .
Example
I use a colon cancer example. I first load and stset the data
. use https://www.pclambert.net/data/colon if stage!=0, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)
. stset surv_mm, f(status=1,2) scale(12) exit(time 120)
Survival-time data settings
Failure event: status==1 2
Observed time interval: (0, surv_mm]
Exit on or before: time 120
Time for analysis: time/12
--------------------------------------------------------------------------
13,208 total observations
0 exclusions
--------------------------------------------------------------------------
13,208 observations remaining, representing
8,866 failures in single-record/single-failure data
43,950.667 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 10I drop those with missing stage information (stage == 0). I am investigating all cause survival (status=1,2).
I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.
. gen female = sex==2
. stpm3 i.stage i.female @ns(age,df(3)), scale(lncumhazard) df(4) nolog eform
Number of obs = 13,208
Wald chi2(6) = 6155.27
Log likelihood = -19665.932 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| exp(b) Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
xb |
stage |
Regional | 1.758926 .0617187 16.09 0.000 1.642026 1.884149
Distant | 5.656322 .1390555 70.48 0.000 5.39024 5.935539
|
1.female | .8634681 .01891 -6.70 0.000 .8271894 .901338
_ns_f1_age1 | .0019791 .0012531 -9.83 0.000 .0005722 .0068455
_ns_f1_age2 | .7897475 .2383663 -0.78 0.434 .4370927 1.426931
_ns_f1_age3 | .1506021 .025846 -11.03 0.000 .1075845 .2108204
-------------+----------------------------------------------------------------
time |
_ns1 | -14.64251 .1616785 -90.57 0.000 -14.95939 -14.32563
_ns2 | 3.188902 .0786506 40.55 0.000 3.034749 3.343054
_ns3 | -1.66889 .0227393 -73.39 0.000 -1.713459 -1.624322
_ns4 | -1.036622 .0267331 -38.78 0.000 -1.089018 -.984226
_cons | 1.441705 .0983651 14.66 0.000 1.248912 1.634497
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Extended functions
(1) @ns(age, df(3))There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.
I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.
. range tt 0 10 100
(13,108 missing values generated)
. standsurv, surv timevar(tt) ci frame(surv, replace) ///
> at1(female 0) at2(female 1) ///
> atvar(ms_male ms_female)
. frame surv {
. twoway (line ms_male ms_female tt, sort) ///
> , yline(0.5, lpattern(dash) lcolor(black)) ///
> yline(0.5, lpattern(dash) lcolor(black)) ///
> xtitle("Years since diagnosis") ///
> ytitle("S(t)", angle(h)) ///
> ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
> legend(order(1 "Male" 2 "Female") pos(1) )
. } The graph of the two standardised survival functions can be seen below.
As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at \(S(t)=0.5\). Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile option of standsurv will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.
. standsurv, centile(50) ci frame(median) ///
> at1(female 0) at2(female 1) ///
> atvar(med_male med_female) ///
> contrast(difference) contrastvar(med_diff)
. frame median {
. list med_male* med_female*, ab(15) noobs
+----------------------------------------------------------------------------------------+
| med_male med_male_lci med_male_uci med_female med_female_lci med_female_uci |
|----------------------------------------------------------------------------------------|
| 1.9801987 1.8875488 2.0773963 2.4249751 2.3197847 2.5349353 |
+----------------------------------------------------------------------------------------+
. list med_diff* in 1, ab(18) noobs
+-----------------------------------------+
| med_diff med_diff_lci med_diff_uci |
|-----------------------------------------|
| .44477636 .31389716 .57565556 |
+-----------------------------------------+
. }The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).
It is possible to predict for multiple centiles by passing a numlist to the centiles option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.
. standsurv, centile(10(10)60) ci frame(cenrange) centvar(centiles) ///
> at1(female 0) at2(female 1) ///
> atvar(cen_males cen_females) ///
> contrast(difference) contrastvar(cendiff)
. frame cenrange {
. list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)
+------------------------------------------------+
| centiles cen_males cen_females cendiff |
|------------------------------------------------|
| 10 .14919254 .16940399 .02021145 |
| 20 .32365079 .38459929 .0609485 |
| 30 .63821705 .78393263 .14571558 |
| 40 1.1648032 1.4192444 .25444124 |
| 50 1.9801987 2.4249751 .44477636 |
| 60 3.4239223 4.277573 .85365069 |
+------------------------------------------------+
. }We can then plot the difference in these various centiles.
. frame cenrange {
. twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30)) ///
> (line cendiff centile, pstyle(p1line)) ///
> , xtitle(centile) xlabel(,format(%3.0f)) ///
> ytitle("Difference in centile") ///
> ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
> legend(off)
. }There are probably more innovative ways of presenting such data.
Acknowledgement
I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.
References
Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38