# Attributable Fraction from Standardized Survival Functions

This example will demonstrate how the attributable fraction ($AF$) can be obtained for survival data. It will also demonstrate the flexibility to calculate various function of standardized estimates through use of the `userfunction()' option.

The is defined in epidemiology as the proportion of preventable outcomes if all subjects had not been exposed to a particular exposure. i.e.

$$ AF = \frac{P(D=1) - P(D=1|X=0)}{P(D=1)} $$

where $P(D)$ is proportion diseased in the whole population, and $P(D|X=0)$ is the probability of being diseased in the exposed. In observation studies there will be confounding and we need to consider potential confounders, $Z$.

$$ AF = \frac{E(D=1|Z) - E(D=1|X=0,Z)}{P(D|Z)} $$

In survival studies the probability of being diseased is a function of time, so we define the $AF$ using the failure function, $F(t) = 1 - S(t)$, so $AF(t)$ is defined as

$$ AF(t) = \frac{E[F(t|Z)] - E[F(t|X=0,Z)]}{E[F(t|Z)]} = 1 - \frac{E[F(t|X=0,Z)]}{E[F(t|Z)]} $$

$E[F(t|Z)]$ is the standardized failure function over covariate distribution, $Z$, and $E[F(t|X=0,Z)]$ is the standardized failure function over covariate distribution, $Z$ where all subjects forced to be unexposed. See Samualson (2008) for some background.

## Example

I will use the Rotterdam Breast cancer data. The code below loads and `stset`

’s the data and then fits a model using `stpm2`

.

```
. clear all
. use https://www.pclambert.net/data/rott2b,
(Rotterdam breast cancer data (augmented with cause of death))
. stset os, f(osi==1) scale(12) exit(time 120)
failure event: osi == 1
obs. time interval: (0, os]
exit on or before: time 120
t for analysis: time/12
------------------------------------------------------------------------------
2,982 total observations
0 exclusions
------------------------------------------------------------------------------
2,982 observations remaining, representing
1,171 failures in single-record/single-failure data
20,002.424 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 10
. stpm2 hormon age enodes pr_1, scale(hazard) df(4) eform nolog
Log likelihood = -2668.4925 Number of obs = 2,982
------------------------------------------------------------------------------
| exp(b) Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb |
hormon | .7906432 .0715077 -2.60 0.009 .66221 .9439854
age | 1.013244 .0024119 5.53 0.000 1.008528 1.017983
enodes | .1132534 .0110135 -22.40 0.000 .0935998 .1370337
pr_1 | .9064855 .0119282 -7.46 0.000 .8834055 .9301685
_rcs1 | 2.632579 .073494 34.67 0.000 2.492403 2.780638
_rcs2 | 1.184191 .0329234 6.08 0.000 1.121389 1.25051
_rcs3 | 1.020234 .0150787 1.36 0.175 .9911046 1.05022
_rcs4 | .996572 .0073038 -0.47 0.639 .9823591 1.010991
_cons | 1.101826 .17688 0.60 0.546 .80439 1.509244
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
```

It is worthwhile commenting what we mean be “exposed” here. Those on hormal treatment will be consided unexposed and those **not** taking the treatment will be unexposed, i.e our unepxosed group is when `hormon=1`

.

I will first use the `failure`

option to calculate the standardized failure probabilities in both groups. I also predict the failure probability in the population as a whole. I do this using `.`

within an `at()`

option, i.e. using `at3(.)`

in the example below.

```
. range timevar 0 10 101
(2,881 missing values generated)
. stpm2_standsurv, at1(hormon 0) at2(hormon 1) at3(.) timevar(timevar) ci atvar(F_hormon0 F_hormon1 F_all) failure
.
. twoway (line F_hormon0 timevar) ///
> (line F_hormon1 timevar) ///
> (line F_all timevar) ///
> , legend(order(1 "No treatment" 2 "Treatment" 3 "All") cols(1) pos(11)) ///
> ylabel(, format(%3.1f)) ///
> ytitle("S(t)") ///
> xtitle("Years from surgery")
```

These are just 1 - the standardized survival functions. There are more untreated women (88.6%) which is why the “No Treatment” function is closer to the combined function. The attributable fraction could be calculated using

```
. gen AF_tmp = 1 - F_hormon1/F_all
(2,882 missing values generated)
. list timevar F_hormon1 F_all AF_tmp if inlist(timevar,1,5,10), noobs
+--------------------------------------------+
| timevar F_hormon1 F_all AF_tmp |
|--------------------------------------------|
| 1 .01685169 .02035349 .172049 |
| 5 .22362896 .26167585 .145397 |
| 10 .39250923 .44808119 .1240221 |
+--------------------------------------------+
```

I have listed the $AF$ at 1, 5 and 10 years. If I just wanted a point estimate I could stop here. However, generally we will want to calculate confidence intervals. This is where the `userfunction()`

option comes in. We can calculate a transformation of our standardized estimates with standard errors estimated using the delta method where derivatives are calculated numerically (similar to `nlcom`

and `predictnl`

). I “borrowed” the idea of a `userfunction()`

from Arvid Sjölander’s `stdReg`

R package (Sjölander 2018).

The user function needs to be written in Mata. The function should receive one argument `at`

, which refer to the various `at`

options and can be indexed by `at[1]`

, `at[2]`

etc. The code below calculates the AF assuming that `at1`

is the standardized failure function in the population as a whole and `at2`

is the standardized failure function assuming everyone is unexposed (takes hormonal treatment). We need to be careful to specify the `at()`

options is this order.

```
. mata
------------------------------------------------- mata (type end to exit) ------------------------------------------------------------------------------------------------
: function calcAF(at) {
> // at2 is F(t|unexposed,Z)
> // at1 is F(t,Z)
> return(1 - at[2]/at[1])
> }
: end
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

Having defined the Mata function I just pass this to `stpm2_standsurv`

using the `userfunction()`

option.

```
. stpm2_standsurv, at1(.) at2(hormon 1) ci timevar(timevar) failure ///
> userfunction(calcAF) userfunctionvar(AF)
```

I have specified the `userfunctionvar(AF)`

option so that the new variable is called `AF`

. Without this option
the default is `_userfunc`

. I can now plot the AF as a function of follow-up time.

```
. twoway (rarea AF_lci AF_uci timevar, color(red%30)) ///
> (line AF timevar, lcolor(red)) ///
> , legend(off) ///
> ylabel(0(0.05)0.3, format(%4.2f)) ///
> ytitle("AF") ///
> xtitle("Years from surgery")
```

I purposely chose for the effect of hormonal treatment to be proportional as this example is illustrative. When I relaxed this assumption, the AF was negative for the first few months.

Samualson (2008) defines alternative based on the hazard function. I am less keen on this than the use of the survival function, but show how this can be
estimated using `stpm2_standsurv`

for completeness.

Samualson defines this is the attributable hazard fraction. The equation is similar to the AF defined above, but we replace the failure function with the hazard function.

$$ AHF(t) = \frac{E[\lambda(t|Z)] - E[\lambda(t|X=0,Z)]}{E[\lambda(t|Z)]} = 1 - \frac{E[\lambda(t|X=0,Z)]}{E[\lambda(t|Z)]} $$

This give the proportion of preventable events **at** time $t$ rather than **by** time $t$.

See the page of The hazard function of the standardized survival curve. for a description of standardized hazard functions.

As I just have to replace the failure probability with the hazard function, I can just use the same Mata function. This means that I just have the change the option `failure`

to `hazard`

in `stpm2_standsurv`

.

```
. drop _at*
. stpm2_standsurv, at1(.) at2(hormon 1) ci timevar(timevar) hazard ///
> userfunction(calcAF) userfunctionvar(AHF)
```

I can now plot the results.

```
. twoway (rarea AHF_lci AHF_uci timevar, color(red%30)) ///
> (line AHF timevar, lcolor(red)) ///
> , legend(off) ///
> ylabel(0(0.05)0.3, format(%4.2f)) ///
> ytitle("AHF") ///
> xtitle("Years from surgery")
```

## References

Samuelsen S.O., Eide G.E. Attributable fractions with survival data. *Statistics in Medicine* 2008;**27**:1447–1467

Sjölander A. Estimation of causal effect measures with the R-package stdReg.*European Journal of Epidemiology* 2018