Centiles of Standardized Survival Functions
Background
In a previous tutorial I used standsurv
to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.
As a reminder a centile of a survival function can be obtained by solving stpm2
I have used Brent’s root finder when evaluating centiles.
The centile of a standardized survival function is obtained by solving the following equation for t.
This is done through root finding (using Brent’s root finder) by solving,
Variances can be obtained using M-estimation .
Example
I use a colon cancer example. I first load and stset
the data
use https://www.pclambert.net/data/colon if stage!=0, clear
.
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)
stset surv_mm, f(status=1,2) scale(12) exit(time 120)
.
data settings
Survival-time
Failure event: status==1 2
Observed time interval: (0, surv_mm]on or before: time 120
Exit for analysis: time/12
Time
--------------------------------------------------------------------------total observations
13,208
0 exclusions
--------------------------------------------------------------------------
13,208 observations remaining, representingin single-record/single-failure data
8,866 failures total analysis time at risk and under observation
43,950.667
At risk from t = 0
Earliest observed entry t = 0exit t = 10 Last observed
I drop those with missing stage information (stage == 0
). I am investigating all cause survival (status=1,2
).
I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv
would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.
gen female = sex==2
.
scale(lncumhazard) df(4) nolog eform
. stpm3 i.stage i.female @ns(age,df(3)),
of obs = 13,208
Number chi2(6) = 6155.27
Wald chi2 = 0.0000
Log likelihood = -19665.932 Prob >
------------------------------------------------------------------------------exp(b) Std. err. z P>|z| [95% conf. interval]
|
-------------+----------------------------------------------------------------xb |
stage |
Regional | 1.758926 .0617187 16.09 0.000 1.642026 1.884149
Distant | 5.656322 .1390555 70.48 0.000 5.39024 5.935539
|
1.female | .8634681 .01891 -6.70 0.000 .8271894 .901338
_ns_f1_age1 | .0019791 .0012531 -9.83 0.000 .0005722 .0068455
_ns_f1_age2 | .7897475 .2383663 -0.78 0.434 .4370927 1.426931
_ns_f1_age3 | .1506021 .025846 -11.03 0.000 .1075845 .2108204
-------------+----------------------------------------------------------------
time |
_ns1 | -14.64251 .1616785 -90.57 0.000 -14.95939 -14.32563
_ns2 | 3.188902 .0786506 40.55 0.000 3.034749 3.343054
_ns3 | -1.66889 .0227393 -73.39 0.000 -1.713459 -1.624322
_ns4 | -1.036622 .0267331 -38.78 0.000 -1.089018 -.984226_cons | 1.441705 .0983651 14.66 0.000 1.248912 1.634497
------------------------------------------------------------------------------in the first equation.
Note: Estimates are transformed only
Extended functions (1) @ns(age, df(3))
There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.
I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.
range tt 0 10 100
. missing values generated)
(13,108
ci frame(surv, replace) ///
. standsurv, surv timevar(tt) ///
> at1(female 0) at2(female 1)
> atvar(ms_male ms_female)
. frame surv { twoway (line ms_male ms_female tt, sort) ///
. yline(0.5, lpattern(dash) lcolor(black)) ///
> , yline(0.5, lpattern(dash) lcolor(black)) ///
> xtitle("Years since diagnosis") ///
> ytitle("S(t)", angle(h)) ///
> ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
> legend(order(1 "Male" 2 "Female") pos(1) )
> . }
The graph of the two standardised survival functions can be seen below.
As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at centile
option of standsurv
will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast
option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.
centile(50) ci frame(median) ///
. standsurv, ///
> at1(female 0) at2(female 1) ///
> atvar(med_male med_female)
> contrast(difference) contrastvar(med_diff)
median {
. frame list med_male* med_female*, ab(15) noobs
.
+----------------------------------------------------------------------------------------+
| med_male med_male_lci med_male_uci med_female med_female_lci med_female_uci |
|----------------------------------------------------------------------------------------|
| 1.9801987 1.8875488 2.0773963 2.4249751 2.3197847 2.5349353 |
+----------------------------------------------------------------------------------------+list med_diff* in 1, ab(18) noobs
.
+-----------------------------------------+
| med_diff med_diff_lci med_diff_uci |
|-----------------------------------------|
| .44477636 .31389716 .57565556 |
+-----------------------------------------+ . }
The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast
option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).
It is possible to predict for multiple centiles by passing a numlist to the centiles
option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.
centile(10(10)60) ci frame(cenrange) centvar(centiles) ///
. standsurv, ///
> at1(female 0) at2(female 1) ///
> atvar(cen_males cen_females)
> contrast(difference) contrastvar(cendiff)
. frame cenrange {list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)
.
+------------------------------------------------+
| centiles cen_males cen_females cendiff |
|------------------------------------------------|
| 10 .14919254 .16940399 .02021145 |
| 20 .32365079 .38459929 .0609485 |
| 30 .63821705 .78393263 .14571558 |
| 40 1.1648032 1.4192444 .25444124 |
| 50 1.9801987 2.4249751 .44477636 |
| 60 3.4239223 4.277573 .85365069 |
+------------------------------------------------+ . }
We can then plot the difference in these various centiles.
. frame cenrange {twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30)) ///
. line cendiff centile, pstyle(p1line)) ///
> (xtitle(centile) xlabel(,format(%3.0f)) ///
> , ytitle("Difference in centile") ///
> ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
> legend(off)
> . }
There are probably more innovative ways of presenting such data.
Acknowledgement
I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.
References
Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38