Attributable Fraction from Standardized Survival Functions

This example will demonstrate how the attributable fraction (\(AF\)) can be obtained for survival data. It will also demonstrate the flexibility to calculate various function of standardized estimates through use of the userfunction() option.

The is defined in epidemiology as the proportion of preventable outcomes if all subjects had not been exposed to a particular exposure. i.e.

\[ AF = \frac{P(D=1) - P(D=1|X=0)}{P(D=1)} \]

where \(P(D)\) is proportion diseased in the whole population, and \(P(D|X=0)\) is the probability of being diseased in the exposed. In observation studies there will be confounding and thus we need to consider potential confounders, \(Z\).

\[ AF = \frac{E(D=1|Z) - E(D=1|X=0,Z)}{P(D|Z)} \]

In survival studies the probability of being diseased is a function of time, so we define the \(AF\) using the failure function, \(F(t) = 1 - S(t)\), so \(AF(t)\) is defined as

\[ AF(t) = \frac{E[F(t|Z)] - E[F(t|X=0,Z)]}{E[F(t|Z)]} = 1 - \frac{E[F(t|X=0,Z)]}{E[F(t|Z)]} \]

\(E[F(t|Z)]\) is the standardized failure function over covariate distribution, \(Z\), and \(E[F(t|X=0,Z)]\) is the standardized failure function over covariate distribution, \(Z\) where all subjects forced to be unexposed. See Samualson (2008) for some background.

Example

I will use the Rotterdam Breast cancer data. The code below loads and stset’s the data and then fits a model using stpm3.

. clear all

. use https://www.pclambert.net/data/rott3, 
(Rotterdam breast cancer data (augmented with cause of death))

. stset os, f(osi==1) scale(12) exit(time 120)

Survival-time data settings

         Failure event: osi==1
Observed time interval: (0, os]
     Exit on or before: time 120
     Time for analysis: time/12

--------------------------------------------------------------------------
      2,982  total observations
          0  exclusions
--------------------------------------------------------------------------
      2,982  observations remaining, representing
      1,171  failures in single-record/single-failure data
 20,002.424  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

. stpm3 i.hormon age enodes pr_1, scale(lncumhazard) df(4) eform nolog 

                                                        Number of obs =  2,982
                                                        Wald chi2(4)  = 619.62
Log likelihood = -2668.4925                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
      hormon |
        yes  |   .7906432   .0715077    -2.60   0.009       .66221    .9439854
         age |   1.013244   .0024119     5.53   0.000     1.008528    1.017983
      enodes |   .1132534   .0110135   -22.40   0.000     .0935998    .1370337
        pr_1 |   .9064855   .0119282    -7.46   0.000     .8834055    .9301685
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -25.90082   1.871965   -13.84   0.000     -29.5698   -22.23184
        _ns2 |   7.980587   1.003724     7.95   0.000     6.013324    9.947851
        _ns3 |  -1.091126   .0461407   -23.65   0.000     -1.18156   -1.000691
        _ns4 |    -.70103   .0504635   -13.89   0.000    -.7999366   -.6021234
       _cons |    .801967    .161537     4.96   0.000     .4853603    1.118574
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

It is worthwhile commenting what we mean be “exposed” here. Those on hormal treatment will be consided unexposed and those not taking the treatment will be exposed, i.e our unexposed group is when hormon=1.

I will first use the failure option to calculate the standardized failure probabilities in both groups. I also predict the failure probability in the population as a whole. I do this using . within an at option, i.e. using at3(.) in the example below.

. range timevar 0 10 101
(2,881 missing values generated)

. standsurv, failure timevar(timevar) ci frame(Ft, replace)             ///
>            at1(hormon 0) at2(hormon 1) at3(.) atvar(F_hormon0 F_hormon1 F_all)

. frame Ft {
.   twoway (line F_hormon0 timevar)                                       ///
>          (line F_hormon1 timevar)                                       ///
>          (line F_all timevar),                                          ///
>           legend(order(1 "No treatment" 2 "Treatment" 3 "All") pos(11)) /// 
>           ylabel(, format(%3.1f))                                       ///
>           ytitle("S(t)")                                                ///
>           xtitle("Years from surgery") 
. }

These are just 1 - the standardized survival functions. There are more untreated women (88.6%) which is why the “No Treatment” function is closer to the combined function. The attributable fraction could be calculated using

. frame Ft {
.   gen AF_tmp = 1 - F_hormon1/F_all
(1 missing value generated)
.   list timevar F_hormon1 F_all AF_tmp if inlist(timevar,1,5,10), noobs

  +---------------------------------------------+
  | timevar   F_hormon1       F_all      AF_tmp |
  |---------------------------------------------|
  |       1   .01685169   .02035349   .17204904 |
  |       5   .22362896   .26167585   .14539701 |
  |      10   .39250923   .44808119   .12402208 |
  +---------------------------------------------+
. }

I have listed the \(AF\) at 1, 5 and 10 years. If I just wanted a point estimate I could stop here. However, generally we will want to calculate confidence intervals. This is where the userfunction() option comes in. We can calculate a transformation of our standardized estimates with standard errors estimated using the delta method where derivatives are calculated numerically (similar to nlcom and predictnl). I “borrowed” the idea of a userfunction() from Arvid Sjölander’s stdReg R package (Sjölander 2018).

The user function needs to be written in Mata. The function should receive one argument at, which refer to the various at options and can be indexed by at[1], at[2] etc. The code below calculates the AF assuming that at1 is the standardized failure function in the population as a whole and at2 is the standardized failure function assuming everyone is unexposed (takes hormonal treatment). We need to be careful to specify the at() options is this order.

. mata
------------------------------------------------- mata (type end to exit) -----------------------------------------------------------------------
: function calcAF(at) {
>     // at2 is F(t|unexposed,Z)
>     // at1 is F(t,Z)
>     return(1 - at[2]/at[1])
> }

: end
-------------------------------------------------------------------------------------------------------------------------------------------------

Having defined the Mata function I just pass this to standsurv using the userfunction() option.

. standsurv, failure timevar(timevar) ci frame(Ft2, replace)   ///
>            at1(.) at2(hormon 0) atvar(F_hormon1 F_all)       ///    
>            userfunction(calcAF) userfunctionvar(AF)

I have specified the userfunctionvar(AF) option so that the new variable is called AF. Without this option the default is _userfunc. I can now plot the AF as a function of follow-up time.

. frame Ft2 {
.   twoway (rarea AF_lci AF_uci timevar, color(red%30)) ///
>          (line AF timevar, lcolor(red))               ///
>          , legend(off)                                /// 
>          ylabel(0(0.05)0.3, format(%4.2f))            ///
>          ytitle("AF")                                 ///
>          xtitle("Years from surgery") 
. }

I purposely chose for the effect of hormonal treatment to be proportional as this example is illustrative. When I relaxed this assumption, the AF was negative for the first few months.

Samualson (2008) defines alternative based on the hazard function. I am less keen on this than the use of the survival function, but show how this can be estimated using standsurv for completeness.

Samualson defines this is the attributable hazard fraction. The equation is similar to the AF defined above, but we replace the failure function with the hazard function.

\[ AHF(t) = \frac{E[\lambda(t|Z)] - E[\lambda(t|X=0,Z)]}{E[\lambda(t|Z)]} = 1 - \frac{E[\lambda(t|X=0,Z)]}{E[\lambda(t|Z)]} \]

This give the proportion of preventable events at time \(t\) rather than by time \(t\).

See the page of The hazard function of the standardized survival curve. for a description of standardized hazard functions.

As I just have to replace the failure probability with the hazard function, I can just use the same Mata function. This means that I just have the change the option failure to hazard in standsurv.

. standsurv, hazard timevar(timevar) ci frame(haz, replace)   ///
>            at1(.) at2(hormon 0) atvar(F_hormon1 F_all)      ///    
>            userfunction(calcAF) userfunctionvar(AHF)

I can now plot the results.

. frame haz {
.   twoway (rarea AHF_lci AHF_uci timevar, color(red%30)) ///
>          (line AHF timevar, lcolor(red))                ///
>          , legend(off)                                  /// 
>          ylabel(0(0.05)0.3, format(%4.2f))              ///
>          ytitle("AHF")                                  ///
>          xtitle("Years from surgery") 
. }

References

Samuelsen S.O., Eide G.E. Attributable fractions with survival data. Statistics in Medicine 2008;27:1447–1467

Sjölander A. Estimation of causal effect measures with the R-package stdReg.European Journal of Epidemiology 2018