Paul Lambert
  • Home
  • Publications
  • Software & Tutorials
  • Talks
  • Courses
  • Interactive graphs

On this page

  • Background
  • Example
  • Acknowledgement
  • References

Centiles of Standardized Survival Functions

Background

In a previous tutorial I used standsurv to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.

As a reminder a centile of a survival function can be obtained by solving \(S(t) = \alpha\) for \(t\). For example, for the median survival time we set \(\alpha = 0.5\), i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for \(t\) analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2 I have used Brent’s root finder when evaluating centiles.

The centile of a standardized survival function is obtained by solving the following equation for t.

\[ E\left(S(t | X=x,Z\right) = \alpha \]

This is done through root finding (using Brent’s root finder) by solving,

\[ \frac{1}{N}\sum_{i=1}^N {S(t | X=x,Z)} - \alpha = 0 \]

Variances can be obtained using M-estimation .

Example

I use a colon cancer example. I first load and stset the data

. use https://www.pclambert.net/data/colon if stage!=0, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm, f(status=1,2) scale(12) exit(time 120)

Survival-time data settings

         Failure event: status==1 2
Observed time interval: (0, surv_mm]
     Exit on or before: time 120
     Time for analysis: time/12

--------------------------------------------------------------------------
     13,208  total observations
          0  exclusions
--------------------------------------------------------------------------
     13,208  observations remaining, representing
      8,866  failures in single-record/single-failure data
 43,950.667  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

I drop those with missing stage information (stage == 0). I am investigating all cause survival (status=1,2).

I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.

. gen female = sex==2

. stpm3 i.stage i.female @ns(age,df(3)), scale(lncumhazard) df(4) nolog eform

                                                       Number of obs =  13,208
                                                       Wald chi2(6)  = 6155.27
Log likelihood = -19665.932                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
       stage |
   Regional  |   1.758926   .0617187    16.09   0.000     1.642026    1.884149
    Distant  |   5.656322   .1390555    70.48   0.000      5.39024    5.935539
             |
    1.female |   .8634681     .01891    -6.70   0.000     .8271894     .901338
 _ns_f1_age1 |   .0019791   .0012531    -9.83   0.000     .0005722    .0068455
 _ns_f1_age2 |   .7897475   .2383663    -0.78   0.434     .4370927    1.426931
 _ns_f1_age3 |   .1506021    .025846   -11.03   0.000     .1075845    .2108204
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -14.64251   .1616785   -90.57   0.000    -14.95939   -14.32563
        _ns2 |   3.188902   .0786506    40.55   0.000     3.034749    3.343054
        _ns3 |   -1.66889   .0227393   -73.39   0.000    -1.713459   -1.624322
        _ns4 |  -1.036622   .0267331   -38.78   0.000    -1.089018    -.984226
       _cons |   1.441705   .0983651    14.66   0.000     1.248912    1.634497
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Extended functions
 (1) @ns(age, df(3))

There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.

I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.

. range tt 0 10 100
(13,108 missing values generated)

. standsurv, surv timevar(tt) ci frame(surv, replace)  ///
>            at1(female 0) at2(female 1)               ///
>            atvar(ms_male ms_female)      

. frame surv {           
.   twoway (line ms_male ms_female tt, sort)           ///
>          , yline(0.5, lpattern(dash) lcolor(black))  ///
>          yline(0.5, lpattern(dash) lcolor(black))    ///
>          xtitle("Years since diagnosis")             ///
>          ytitle("S(t)", angle(h))                    ///
>          ylabel(0(0.2)1, format(%3.1f) angle(h))     ///
>          legend(order(1 "Male" 2 "Female") pos(1) )
. }         

The graph of the two standardised survival functions can be seen below.

As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at \(S(t)=0.5\). Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile option of standsurv will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.

. standsurv, centile(50) ci frame(median)              ///
>           at1(female 0) at2(female 1)                ///
>                       atvar(med_male med_female)                 ///
>           contrast(difference) contrastvar(med_diff)

. frame median {
.   list med_male* med_female*,  ab(15) noobs     

  +----------------------------------------------------------------------------------------+
  |  med_male   med_male_lci   med_male_uci   med_female   med_female_lci   med_female_uci |
  |----------------------------------------------------------------------------------------|
  | 1.9801987      1.8875488      2.0773963    2.4249751        2.3197847        2.5349353 |
  +----------------------------------------------------------------------------------------+
.   list med_diff* in 1, ab(18) noobs

  +-----------------------------------------+
  |  med_diff   med_diff_lci   med_diff_uci |
  |-----------------------------------------|
  | .44477636      .31389716      .57565556 |
  +-----------------------------------------+
. }

The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).

It is possible to predict for multiple centiles by passing a numlist to the centiles option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.

. standsurv, centile(10(10)60) ci frame(cenrange) centvar(centiles)   ///
>            at1(female 0) at2(female 1)                              ///
>                        atvar(cen_males cen_females)                             ///
>            contrast(difference) contrastvar(cendiff) 

. frame cenrange {
.   list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)

  +------------------------------------------------+
  | centiles   cen_males   cen_females     cendiff |
  |------------------------------------------------|
  |       10   .14919254     .16940399   .02021145 |
  |       20   .32365079     .38459929    .0609485 |
  |       30   .63821705     .78393263   .14571558 |
  |       40   1.1648032     1.4192444   .25444124 |
  |       50   1.9801987     2.4249751   .44477636 |
  |       60   3.4239223      4.277573   .85365069 |
  +------------------------------------------------+
. }

We can then plot the difference in these various centiles.

. frame cenrange {
.   twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30))  ///
>          (line cendiff centile, pstyle(p1line))                    ///
>              , xtitle(centile) xlabel(,format(%3.0f))                  ///
>              ytitle("Difference in centile")                           ///
>          ylabel(0(0.2)1.2,format(%3.1f) angle(h))                  ///
>              legend(off)
. }

There are probably more innovative ways of presenting such data.

Acknowledgement

I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.

References

Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38