Centiles of Standardized Survival Functions

Background

In a previous tutorial I used standsurv to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.

As a reminder a centile of a survival function can be obtained by solving $S (t) = α$ for $t$ . For example, for the median survival time we set $α = 0.5$ , i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for $t$ analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2 I have used Brent’s root finder when evaluating centiles.

The centile of a standardized survival function is obtained by solving the following equation for t.

$E (S (t | X = x, Z) = α$

This is done through root finding (using Brent’s root finder) by solving,

$\frac{1}{N} \sum_{i = 1}^{N} S (t | X = x, Z) - α = 0$

Variances can be obtained using M-estimation .

Example

I use a colon cancer example. I first load and stset the data

. use https://www.pclambert.net/data/colon if stage!=0, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm, f(status=1,2) scale(12) exit(time 120)

Survival-time data settings

         Failure event: status==1 2
Observed time interval: (0, surv_mm]
     Exit on or before: time 120
     Time for analysis: time/12

--------------------------------------------------------------------------
     13,208  total observations
          0  exclusions
--------------------------------------------------------------------------
     13,208  observations remaining, representing
      8,866  failures in single-record/single-failure data
 43,950.667  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

I drop those with missing stage information (stage == 0). I am investigating all cause survival (status=1,2).

I fit a model that includes stage, sex and age (using a natural spline). I assume proportional hazards, but if I relax this assusmption the syntax for standsurv would be identical. Stage is classified as localised, regional and distant and is modelled as a factor variable with localised as the reference category.

. gen female = sex==2

. stpm3 i.stage i.female @ns(age,df(3)), scale(lncumhazard) df(4) nolog eform

                                                       Number of obs =  13,208
                                                       Wald chi2(6)  = 6155.27
Log likelihood = -19665.932                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
       stage |
   Regional  |   1.758926   .0617187    16.09   0.000     1.642026    1.884149
    Distant  |   5.656322   .1390555    70.48   0.000      5.39024    5.935539
             |
    1.female |   .8634681     .01891    -6.70   0.000     .8271894     .901338
 _ns_f1_age1 |   .0019791   .0012531    -9.83   0.000     .0005722    .0068455
 _ns_f1_age2 |   .7897475   .2383663    -0.78   0.434     .4370927    1.426931
 _ns_f1_age3 |   .1506021    .025846   -11.03   0.000     .1075845    .2108204
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -14.64251   .1616785   -90.57   0.000    -14.95939   -14.32563
        _ns2 |   3.188902   .0786506    40.55   0.000     3.034749    3.343054
        _ns3 |   -1.66889   .0227393   -73.39   0.000    -1.713459   -1.624322
        _ns4 |  -1.036622   .0267331   -38.78   0.000    -1.089018    -.984226
       _cons |   1.441705   .0983651    14.66   0.000     1.248912    1.634497
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Extended functions
 (1) @ns(age, df(3))

There is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.

I will now predict two standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.

. range tt 0 10 100
(13,108 missing values generated)

. standsurv, surv timevar(tt) ci frame(surv, replace)  ///
>            at1(female 0) at2(female 1)               ///
>            atvar(ms_male ms_female)      

. frame surv {           
.   twoway (line ms_male ms_female tt, sort)           ///
>          , yline(0.5, lpattern(dash) lcolor(black))  ///
>          yline(0.5, lpattern(dash) lcolor(black))    ///
>          xtitle("Years since diagnosis")             ///
>          ytitle("S(t)", angle(h))                    ///
>          ylabel(0(0.2)1, format(%3.1f) angle(h))     ///
>          legend(order(1 "Male" 2 "Female") pos(1) )
. }

The graph of the two standardised survival functions can be seen below.

As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at $S (t) = 0.5$ . Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile option of standsurv will estimate these values more accurately with 95% confidence intervals. We are also interested in contrasts of the centiles, so use of the contrast option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.

. standsurv, centile(50) ci frame(median)              ///
>           at1(female 0) at2(female 1)                ///
>                       atvar(med_male med_female)                 ///
>           contrast(difference) contrastvar(med_diff)

. frame median {
.   list med_male* med_female*,  ab(15) noobs     

  +----------------------------------------------------------------------------------------+
  |  med_male   med_male_lci   med_male_uci   med_female   med_female_lci   med_female_uci |
  |----------------------------------------------------------------------------------------|
  | 1.9801987      1.8875488      2.0773963    2.4249751        2.3197847        2.5349353 |
  +----------------------------------------------------------------------------------------+
.   list med_diff* in 1, ab(18) noobs

  +-----------------------------------------+
  |  med_diff   med_diff_lci   med_diff_uci |
  |-----------------------------------------|
  | .44477636      .31389716      .57565556 |
  +-----------------------------------------+
. }

The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).

It is possible to predict for multiple centiles by passing a numlist to the centiles option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.

. standsurv, centile(10(10)60) ci frame(cenrange) centvar(centiles)   ///
>            at1(female 0) at2(female 1)                              ///
>                        atvar(cen_males cen_females)                             ///
>            contrast(difference) contrastvar(cendiff) 

. frame cenrange {
.   list centiles cen_males cen_females cendiff, sep(0) noobs ab(12)

  +------------------------------------------------+
  | centiles   cen_males   cen_females     cendiff |
  |------------------------------------------------|
  |       10   .14919254     .16940399   .02021145 |
  |       20   .32365079     .38459929    .0609485 |
  |       30   .63821705     .78393263   .14571558 |
  |       40   1.1648032     1.4192444   .25444124 |
  |       50   1.9801987     2.4249751   .44477636 |
  |       60   3.4239223      4.277573   .85365069 |
  +------------------------------------------------+
. }

We can then plot the difference in these various centiles.

. frame cenrange {
.   twoway (rarea cendiff_lci cendiff_uci centile, sort color(%30))  ///
>          (line cendiff centile, pstyle(p1line))                    ///
>              , xtitle(centile) xlabel(,format(%3.0f))                  ///
>              ytitle("Difference in centile")                           ///
>          ylabel(0(0.2)1.2,format(%3.1f) angle(h))                  ///
>              legend(off)
. }

There are probably more innovative ways of presenting such data.

Acknowledgement

I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David wrote a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.

References

Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38