Centiles of Standardized Survival Functions
Background
In a previous tutorial I used standsurv to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.
As a reminder a centile of a survival function can be obtained by solving stpm2 I have used Brent’s root finder when evaluating centiles.
The centile of a standardized survival function is obtained by solving the following equation for t.
This is done through root finding (using Brent’s root finder) by solving,
Variances can be obtained using M-estimation .
Example
I use a colon cancer example. I first load and stset the data
. use https://www.pclambert.net/data/colon if stage!=0, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)
. stset surv_mm, f(status=1,2) scale(12) exit(time 120)
Survival-time data settings
Failure event: status==1 2
Observed time interval: (0, surv_mm]
Exit on or before: time 120
Time for analysis: time/12
--------------------------------------------------------------------------
13,208 total observations
0 exclusions
--------------------------------------------------------------------------
13,208 observations remaining, representing
8,866 failures in single-record/single-failure data
43,950.667 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 10I drop those with missing stage information (stage == 0). I am investigating all cause survival (status=1,2).
I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.
. gen female = sex==2
. stpm3 i.stage i.female @ns(age,df(3)), scale(lncumhazard) df(4) nolog eform
Number of obs = 13,208
Wald chi2(6) = 6155.27
Log likelihood = -19665.932 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| exp(b) Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
xb |
stage |
Regional | 1.758926 .0617187 16.09 0.000 1.642026 1.884149
Distant | 5.656322 .1390555 70.48 0.000 5.39024 5.935539
|
1.female | .8634681 .01891 -6.70 0.000 .8271894 .901338
_ns_f1_age1 | .0019791 .0012531 -9.83 0.000 .0005722 .0068455
_ns_f1_age2 | .7897475 .2383663 -0.78 0.434 .4370927 1.426931
_ns_f1_age3 | .1506021 .025846 -11.03 0.000 .1075845 .2108204
-------------+----------------------------------------------------------------
time |
_ns1 | -14.64251 .1616785 -90.57 0.000 -14.95939 -14.32563
_ns2 | 3.188902 .0786506 40.55 0.000 3.034749 3.343054
_ns3 | -1.66889 .0227393 -73.39 0.000 -1.713459 -1.624322
_ns4 | -1.036622 .0267331 -38.78 0.000 -1.089018 -.984226
_cons | 1.441705 .0983651 14.66 0.000 1.248912 1.634497
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Extended functions
(1) @ns(age, df(3))There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.
I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.
. range tt 0 10 100
(13,108 missing values generated)
. standsurv, surv timevar(tt) ci frame(surv, replace) ///
> at1(female 0) at2(female 1) ///
> atvar(ms_male ms_female)
. frame surv {
. twoway (line ms_male ms_female tt, sort) ///
> , yline(0.5, lpattern(dash) lcolor(black)) ///
> yline(0.5, lpattern(dash) lcolor(black)) ///
> xtitle("Years since diagnosis") ///
> ytitle("S(t)", angle(h)) ///
> ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
> legend(order(1 "Male" 2 "Female") pos(1) )
. } The graph of the two standardised survival functions can be seen below.
As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at centile option of standsurv will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.
. standsurv, centile(50) ci frame(median) ///
> at1(female 0) at2(female 1) ///
> atvar(med_male med_female) ///
> contrast(difference) contrastvar(med_diff)
. frame median {
. list med_male* med_female*, ab(15) noobs
+----------------------------------------------------------------------------------------+
| med_male med_male_lci med_male_uci med_female med_female_lci med_female_uci |
|----------------------------------------------------------------------------------------|
| 1.9801987 1.8875488 2.0773963 2.4249751 2.3197847 2.5349353 |
+----------------------------------------------------------------------------------------+
. list med_diff* in 1, ab(18) noobs
+-----------------------------------------+
| med_diff med_diff_lci med_diff_uci |
|-----------------------------------------|
| .44477636 .31389716 .57565556 |
+-----------------------------------------+
. }The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).
It is possible to predict for multiple centiles by passing a numlist to the centiles option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.
. standsurv, centile(10(10)60) ci frame(cenrange) centvar(centiles) ///
> at1(female 0) at2(female 1) ///
> atvar(cen_males cen_females) ///
> contrast(difference) contrastvar(cendiff)
. frame cenrange {
. list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)
+------------------------------------------------+
| centiles cen_males cen_females cendiff |
|------------------------------------------------|
| 10 .14919254 .16940399 .02021145 |
| 20 .32365079 .38459929 .0609485 |
| 30 .63821705 .78393263 .14571558 |
| 40 1.1648032 1.4192444 .25444124 |
| 50 1.9801987 2.4249751 .44477636 |
| 60 3.4239223 4.277573 .85365069 |
+------------------------------------------------+
. }We can then plot the difference in these various centiles.
. frame cenrange {
. twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30)) ///
> (line cendiff centile, pstyle(p1line)) ///
> , xtitle(centile) xlabel(,format(%3.0f)) ///
> ytitle("Difference in centile") ///
> ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
> legend(off)
. }There are probably more innovative ways of presenting such data.
Acknowledgement
I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.
References
Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38