Centiles of Standardized Survival Functions
Background
In a previous tutorial I used standsurv
to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.
As a reminder a centile of a survival function can be obtained by solving \(S(t) = \alpha\) for \(t\). For example, for the median survival time we set \(\alpha = 0.5\), i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for \(t\) analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2
I have used Brent’s root finder when evaluating centiles.
The centile of a standardized survival function is obtained by solving the following equation for t.
\[ E\left(S(t | X=x,Z\right) = \alpha \]
This is done through root finding (using Brent’s root finder) by solving,
\[ \frac{1}{N}\sum_{i=1}^N {S(t | X=x,Z)} - \alpha = 0 \]
Variances can be obtained using M-estimation .
Example
I use a colon cancer example. I first load and stset
the data
use https://www.pclambert.net/data/colon if stage!=0, clear
.
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)
stset surv_mm, f(status=1,2) scale(12) exit(time 120)
.
data settings
Survival-time
Failure event: status==1 2
Observed time interval: (0, surv_mm]on or before: time 120
Exit for analysis: time/12
Time
--------------------------------------------------------------------------total observations
13,208
0 exclusions
--------------------------------------------------------------------------
13,208 observations remaining, representingin single-record/single-failure data
8,866 failures total analysis time at risk and under observation
43,950.667
At risk from t = 0
Earliest observed entry t = 0exit t = 10 Last observed
I drop those with missing stage information (stage == 0
). I am investigating all cause survival (status=1,2
).
I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv
would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.
gen female = sex==2
.
scale(lncumhazard) df(4) nolog eform
. stpm3 i.stage i.female @ns(age,df(3)),
of obs = 13,208
Number chi2(6) = 6155.27
Wald chi2 = 0.0000
Log likelihood = -19665.932 Prob >
------------------------------------------------------------------------------exp(b) Std. err. z P>|z| [95% conf. interval]
|
-------------+----------------------------------------------------------------xb |
stage |
Regional | 1.758926 .0617187 16.09 0.000 1.642026 1.884149
Distant | 5.656322 .1390555 70.48 0.000 5.39024 5.935539
|
1.female | .8634681 .01891 -6.70 0.000 .8271894 .901338
_ns_f1_age1 | .0019791 .0012531 -9.83 0.000 .0005722 .0068455
_ns_f1_age2 | .7897475 .2383663 -0.78 0.434 .4370927 1.426931
_ns_f1_age3 | .1506021 .025846 -11.03 0.000 .1075845 .2108204
-------------+----------------------------------------------------------------
time |
_ns1 | -14.64251 .1616785 -90.57 0.000 -14.95939 -14.32563
_ns2 | 3.188902 .0786506 40.55 0.000 3.034749 3.343054
_ns3 | -1.66889 .0227393 -73.39 0.000 -1.713459 -1.624322
_ns4 | -1.036622 .0267331 -38.78 0.000 -1.089018 -.984226_cons | 1.441705 .0983651 14.66 0.000 1.248912 1.634497
------------------------------------------------------------------------------in the first equation.
Note: Estimates are transformed only
Extended functions (1) @ns(age, df(3))
There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.
I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.
range tt 0 10 100
. missing values generated)
(13,108
ci frame(surv, replace) ///
. standsurv, surv timevar(tt) ///
> at1(female 0) at2(female 1)
> atvar(ms_male ms_female)
. frame surv { twoway (line ms_male ms_female tt, sort) ///
. yline(0.5, lpattern(dash) lcolor(black)) ///
> , yline(0.5, lpattern(dash) lcolor(black)) ///
> xtitle("Years since diagnosis") ///
> ytitle("S(t)", angle(h)) ///
> ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
> legend(order(1 "Male" 2 "Female") pos(1) )
> . }
The graph of the two standardised survival functions can be seen below.
As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at \(S(t)=0.5\). Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile
option of standsurv
will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast
option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.
centile(50) ci frame(median) ///
. standsurv, ///
> at1(female 0) at2(female 1) ///
> atvar(med_male med_female)
> contrast(difference) contrastvar(med_diff)
median {
. frame list med_male* med_female*, ab(15) noobs
.
+----------------------------------------------------------------------------------------+
| med_male med_male_lci med_male_uci med_female med_female_lci med_female_uci |
|----------------------------------------------------------------------------------------|
| 1.9801987 1.8875488 2.0773963 2.4249751 2.3197847 2.5349353 |
+----------------------------------------------------------------------------------------+list med_diff* in 1, ab(18) noobs
.
+-----------------------------------------+
| med_diff med_diff_lci med_diff_uci |
|-----------------------------------------|
| .44477636 .31389716 .57565556 |
+-----------------------------------------+ . }
The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast
option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).
It is possible to predict for multiple centiles by passing a numlist to the centiles
option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.
centile(10(10)60) ci frame(cenrange) centvar(centiles) ///
. standsurv, ///
> at1(female 0) at2(female 1) ///
> atvar(cen_males cen_females)
> contrast(difference) contrastvar(cendiff)
. frame cenrange {list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)
.
+------------------------------------------------+
| centiles cen_males cen_females cendiff |
|------------------------------------------------|
| 10 .14919254 .16940399 .02021145 |
| 20 .32365079 .38459929 .0609485 |
| 30 .63821705 .78393263 .14571558 |
| 40 1.1648032 1.4192444 .25444124 |
| 50 1.9801987 2.4249751 .44477636 |
| 60 3.4239223 4.277573 .85365069 |
+------------------------------------------------+ . }
We can then plot the difference in these various centiles.
. frame cenrange {twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30)) ///
. line cendiff centile, pstyle(p1line)) ///
> (xtitle(centile) xlabel(,format(%3.0f)) ///
> , ytitle("Difference in centile") ///
> ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
> legend(off)
> . }
There are probably more innovative ways of presenting such data.
Acknowledgement
I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.
References
Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38