Hazard of Standardized Survival Functions
This will be a short tutorial as the ideas are very simple. I have previously discussed standardized survival functions. In survival analysis we know that there is a simple mathematical transformation from hazard to survival function and vice versa. The idea here is to transform to a hazard function from the standardized survival function. Recall that a standardized survival funnction; \(S_s(t|X=x,Z)\) is estimated by
\[ S_s(t|X=x,Z) = \frac{1}{N}\sum_{i=1}^{N}S(t|X=x,Z=z_i) \]
If we apply the usual transformation from survival to hazard to function (\(h(t) = \frac{-d}{dt}\log[S(t)]\)) we get
\[ h_s(t|X=x,Z) = \frac{\sum_{i=1}^{N}S(t|X=x,Z=z_i)h(t|X=x,Z=z_i)}{\sum_{i=1}^{N}S(t|X=x,Z=z_i)} \]
This is a weighted average of the \(N\) individual hazard functions with weights equal to \(S(t|X=x,Z=z_i)\), i.e. the predicted survival function for individual \(i\) when forced to take a specific value of the exposure variable, \(X\), but their observed values of confounding variables, \(Z\).
This is implemented in standsurv
using the hazard
option.
Example
I will use the Rotterdam Breast cancer data. The code below loads and stset
’s the data and then fits a model using stpm3
.
use https://www.pclambert.net/data/rott3, clear
. data (augmented with cause of death))
(Rotterdam breast cancer
stset os, f(osi==1) scale(12) exit(time 120)
.
data settings
Survival-time
Failure event: osi==1
Observed time interval: (0, os]on or before: time 120
Exit for analysis: time/12
Time
--------------------------------------------------------------------------total observations
2,982
0 exclusions
--------------------------------------------------------------------------
2,982 observations remaining, representingin single-record/single-failure data
1,171 failures total analysis time at risk and under observation
20,002.424
At risk from t = 0
Earliest observed entry t = 0exit t = 10
Last observed
scale(lncumhazard) df(4) eform nolog tvc(hormon) dftvc(3)
. stpm3 hormon age enodes pr_1,
of obs = 2,982
Number chi2(4) = 615.95
Wald chi2 = 0.0000
Log likelihood = -2666.5968 Prob >
-------------------------------------------------------------------------------------exp(b) Std. err. z P>|z| [95% conf. interval]
|
--------------------+----------------------------------------------------------------xb |
hormon | .8499802 .0963501 -1.43 0.152 .6806444 1.061445
age | 1.013249 .0024115 5.53 0.000 1.008534 1.017987
enodes | .1132408 .011008 -22.41 0.000 .0935963 .1370084
pr_1 | .9061179 .0119267 -7.49 0.000 .8830411 .9297979
--------------------+----------------------------------------------------------------
time |
_ns1 | -27.09524 2.109681 -12.84 0.000 -31.23014 -22.96034
_ns2 | 8.647725 1.122097 7.71 0.000 6.448455 10.84699
_ns3 | -1.072205 .0477674 -22.45 0.000 -1.165827 -.9785823
_ns4 | -.6930019 .0518048 -13.38 0.000 -.7945373 -.5914664
|
c.hormon#c._ns_tvc1 | 5.425507 3.92237 1.38 0.167 -2.262197 13.11321
|
c.hormon#c._ns_tvc2 | -3.309698 2.096769 -1.58 0.114 -7.419291 .7998943
|
c.hormon#c._ns_tvc3 | -.1256484 .195217 -0.64 0.520 -.5082667 .25697
|_cons | .7984459 .1615956 4.94 0.000 .4817244 1.115168
-------------------------------------------------------------------------------------in the first equation. Note: Estimates are transformed only
I have made the effect of our exposure, hormon
, time-dependent using the tvc
option.
I first calculate the standardized survival curves where everyone is forced to be exposed and then unexposed.
range timevar 0 10 100
. missing values generated)
(2,882
ci frame(surv, replace) ///
. standsurv, surv timevar(timevar) ///
> at1(hormon 0) at2(hormon 1) atvar(S0 S1)
> contrast(difference) contrastvar(Sdiff)
.
. frame surv {twoway (rarea S0_lci S0_uci timevar, color(red%25)) ///
. color(blue%25)) ///
> (rarea S1_lci S1_uci timevar, line S0 timevar, sort lcolor(red)) ///
> (line S1 timevar, sort lcolor(blue)) ///
> (legend(order(1 "No hormonal treatment" 2 "Hormonal treatment") ///
> , ring(0) cols(1) pos(1)) ///
> ylabel(0.5(0.1)1,angle(h) format(%3.1f)) ///
> ytitle("S(t)") ///
> xtitle("Years from surgery")
> . }
If I run standsurv
again with the hazard
option I get the corresponding hazard functions of the standardized curves. This is the marginal hazard ratio (as a function of time).
ci per(1000) frame(hazard, replace) ///
. standsurv, hazard timevar(timevar) ///
> at1(hormon 0) at2(hormon 1) atvar(h0 h1) ratio) contrastvar(hr) > contrast(
Plot the standardized hazard functions.
. frame hazard {twoway (rarea h0_lci h0_uci timevar, color(red%30)) ///
. color(blue%30)) ///
> (rarea h1_lci h1_uci timevar, line h0 timevar, color(red)) ///
> (line h1 timevar, color(blue)) ///
> (legend(off) ///
> , ylabel(,angle(h) format(%3.1f)) ///
> xtitle("Years from surgery")
> . }
I can’t explain the lower and then higher hazard for those on hormon therapy. Perhaps better adjustment for confounders would change this.
I can also plot the ratio of these two hazard functions with a 95% confidence interval.
. frame hazard {twoway (rarea hr_lci hr_uci timevar, color(red%30)) ///
. line hr timevar, color(red)) ///
> (if timevar>0, yscale(log) ///
> ylabel(0.5 1 2 4 8 20 40, angle(h) format(%3.1f)) ///
> xtitle("Years from surgery") ///
> legend(off) ///
> yscale(log)
> . }
If I had used the difference
argument of the contrast()
option I would have obtained the absolute difference in the standardized hazard functions.
I am still thinking about the usefulness of this - in general I prefer the idea of standardized survival functions rather than the corresponding hazard function. However, it is harder to see how the risk of events changes over follow-up time with a cumulative measure (i.e. standardized survival).