Hazard of Standardized Survival Functions
This will be a short tutorial as the ideas are very simple. I have previously discussed standardized survival functions. In survival analysis we know that there is a simple mathematical transformation from hazard to survival function and vice versa. The idea here is to transform to a hazard function from the standardized survival function. Recall that a standardized survival funnction; $S_s(t|X=x,Z)$ is estimated by
$$ S_s(t|X=x,Z) = \frac{1}{N}\sum_{i=1}^{N}S(t|X=x,Z=z_i) $$
If we apply the usual transformation from survival to hazard to function ($h(t) = \frac{-d}{dt}\log[S(t)]$) we get
$$ h_s(t|X=x,Z) = \frac{\sum_{i=1}^{N}S(t|X=x,Z=z_i)h(t|X=x,Z=z_i)}{\sum_{i=1}^{N}S(t|X=x,Z=z_i)} $$
This is a weighted average of the $N$ individual hazard functions with weights equal to $S(t|X=x,Z=z_i)$, i.e. the predicted survival function for individual $i$ when forced to take a specific value of the exposure variable, $X$, but their observed values of confounding variables, $Z$.
This is implemented in standsurv
using the hazard
option.
Example
I will use the Rotterdam Breast cancer data. The code below loads and stset
’s the data and then fits a model using stpm2
.
. use https://www.pclambert.net/data/rott2b, clear
(Rotterdam breast cancer data (augmented with cause of death))
. stset os, f(osi==1) scale(12) exit(time 120)
Survival-time data settings
Failure event: osi==1
Observed time interval: (0, os]
Exit on or before: time 120
Time for analysis: time/12
--------------------------------------------------------------------------
2,982 total observations
0 exclusions
--------------------------------------------------------------------------
2,982 observations remaining, representing
1,171 failures in single-record/single-failure data
20,002.424 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 10
. stpm2 hormon age enodes pr_1, scale(hazard) df(4) eform nolog tvc(hormon) dftvc(3)
Log likelihood = -2666.5999 Number of obs = 2,982
--------------------------------------------------------------------------------
| exp(b) Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
xb |
hormon | .8019893 .0741703 -2.39 0.017 .6690322 .961369
age | 1.013249 .0024115 5.53 0.000 1.008534 1.017987
enodes | .1132406 .011008 -22.41 0.000 .0935961 .1370082
pr_1 | .9061179 .0119267 -7.49 0.000 .883041 .9297979
_rcs1 | 2.644573 .0814503 31.58 0.000 2.489656 2.809129
_rcs2 | 1.209479 .0379393 6.06 0.000 1.13736 1.286172
_rcs3 | 1.014 .0162037 0.87 0.384 .9827339 1.046262
_rcs4 | .9961807 .0072731 -0.52 0.600 .9820273 1.010538
_rcs_hormon1 | 1.003465 .0756175 0.05 0.963 .8656822 1.163176
_rcs_hormon2 | .891054 .056664 -1.81 0.070 .786637 1.009331
_rcs_hormon3 | 1.025052 .0390804 0.65 0.516 .9512477 1.104583
_cons | 1.103353 .1771893 0.61 0.540 .8054133 1.511508
--------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
I have made the effect of our exposure, hormon
, time-dependent using the tvc
option.
I first calculate the standardized survival curves where everyone is forced to be exposed and then unexposed.
. range timevar 0 10 100
(2,882 missing values generated)
. standsurv, at1(hormon 0) at2(hormon 1) timevar(timevar) ci contrast(difference)
.
. twoway (rarea _at1_lci _at1_uci timevar, color(red%25)) ///
> (rarea _at2_lci _at2_uci timevar, color(blue%25)) ///
> (line _at1 timevar, sort lcolor(red)) ///
> (line _at2 timevar, sort lcolor(blue)) ///
> , legend(order(1 "No hormonal treatment" 2 "Hormonal treatment") ring(0) cols(1) pos(1))
> ///
> ylabel(0.5(0.1)1,angle(h) format(%3.1f)) ///
> ytitle("S(t)") ///
> xtitle("Years from surgery")
If I run standsurv
again with the hazard
option I get the corresponding hazard functions of the standardized curves. This is the marginal hazard ratio (as a function of time).
. capture drop _at* _contrast*
. standsurv, at1(hormon 0) at2(hormon 1) timevar(timevar) hazard ci contrast(ratio) per(1000)
Plot the standardized hazard functions.
. twoway (rarea _at1_lci _at1_uci timevar, color(red%30)) ///
> (rarea _at2_lci _at2_uci timevar, color(blue%30)) ///
> (line _at1 timevar, color(red)) ///
> (line _at2 timevar, color(blue)) ///
> , legend(off) ///
> ylabel(,angle(h) format(%3.1f)) ///
> xtitle("Years from surgery")
I can’t explain the lower and then higher hazard for those on hormon therapy. Perhaps better adjustment for confounders would change this.
I can also plot the ratio of these two hazard functions with a 95% confidence interval.
. twoway (rarea _contrast2_1_lci _contrast2_1_uci timevar, color(red%30)) ///
> (line _contrast2_1 timevar, color(red)) ///
> if timevar>0, yscale(log) ///
> ylabel(0.5 1 2 4 8 20 40, angle(h) format(%3.1f)) ///
> xtitle("Years from surgery") ///
> legend(off) ///
> yscale(log)
If I had used the difference
argument of the contrast()
option I would have obtained the absolute difference in the standardized hazard functions.
I am still thinking about the usefulness of this - in general I prefer the idea of standardized survival functions rather than the corresponding hazard function. However, it is harder to see how the risk of events changes over follow-up time with a cumulative measure (i.e. standardized survival).