# stcrprep - computational benefits

When using `stcrprep`

there are some computational benefits when compared to using Stata’s inbuilt `stcrreg`

. One reason for this is that everytime you fit a model using `stcrreg`

you the probability of censoring weights are calculated and the data must be expanded (in the background) when maximising the likelihood. When using `stcrprep`

you only need to do this once.

I have run some timings. If I fit a simple model to the embt1 data with risk score as the only covariate (2 dummy variables) then these are the timings no my current work laptop (Intel i5 - running Stata 15 MP2).

First I load and `stset`

the data.

```
. use https://www.pclambert.net/data/ebmt1_stata.dta, clear
. stset time, failure(status==1) scale(365.25) id(patid) noshow
```

Now, `stcrreg`

can be used

```
. timer clear
. timer on 1
. stcrreg i.score, compete(status==2) nolog noshow
Competing-risks regression No. of obs = 1,977
No. of subjects = 1,977
Failure event : status == 1 No. failed = 456
Competing event: status == 2 No. competing = 685
No. censored = 836
Wald chi2(2) = 9.87
Log pseudolikelihood = -3333.3217 Prob > chi2 = 0.0072
(Std. Err. adjusted for 1,977 clusters in patid)
------------------------------------------------------------------------------
| Robust
_t | SHR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
score |
Medium risk | 1.271221 .1554323 1.96 0.050 1.000333 1.615465
High risk | 1.769853 .3238535 3.12 0.002 1.236465 2.533337
------------------------------------------------------------------------------
. timer off 1
. timer list
1: 23.32 / 1 = 23.3150
```

This takes 23.3 seconds to fit.

I now reload and `stset`

the data, but this time declaring both `status=1`

and `status=2`

as events.

```
. use https://www.pclambert.net/data/ebmt1_stata.dta, clear
. stset time, failure(status==1,2) scale(365.25) id(patid)
```

We can now run `stcrprep`

.

```
. timer on 2
. stcrprep, events(status) keep(score) trans(1)
. timer off 2
. timer list 2
2: 6.22 / 1 = 6.2240
```

This takes 6.2 seconds to run. However, this only restructures the data and calculates the weights. To fit the model, we first generate the event indicator and use `stset`

.

```
. gen event = status == failcode
. stset tstop [iw=weight_c], failure(event) enter(tstart)
```

We use `stcox`

to fit the model.

```
. timer on 3
. stcox i.score
failure _d: event
analysis time _t: tstop
enter on or after: time tstart
weight: [iweight=weight_c]
Iteration 0: log likelihood = -3338.1244
Iteration 1: log likelihood = -3333.4173
Iteration 2: log likelihood = -3333.3113
Iteration 3: log likelihood = -3333.3112
Refining estimates:
Iteration 0: log likelihood = -3333.3112
Cox regression -- Breslow method for ties
No. of subjects = 72,880 Number of obs = 72,880
No. of failures = 456
Time at risk = 6026.27434
LR chi2(2) = 9.63
Log likelihood = -3333.3112 Prob > chi2 = 0.0081
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
score |
Medium risk | 1.271235 .1593392 1.91 0.056 .9943389 1.625238
High risk | 1.769899 .3219273 3.14 0.002 1.239148 2.52798
------------------------------------------------------------------------------
. timer off 3
. timer list
1: 23.32 / 1 = 23.3150
2: 6.22 / 1 = 6.2240
3: 1.41 / 1 = 1.4060
```

This takes 1.4 seconds to run giving a combined total of 7.6 seconds. What is important is that if we want to fit other models (including other covariates etc), then we do not need to run `stcrprep`

again.

To assess the time on larger data I have expanded the data by 20 times and added a small random number to each time, so that there are no ties. I used the following code.

```
expand 20
replace time = time + runiform()*0.0001
replace patid = _n
```

This leads to 19,770 indviduals in the analysis. The fact that there are no ties is perhaps a little unrealistic in a dataset this size, but this is still a usefull assessment of computational speed. The same analysis as above on this larger dataset gave the following times.

command | Time |
---|---|

`stcrreg` |
2066.3 seconds |

`stcrprep` |
890.2 seconds |

`stcox` |
46.1 seconds |

I think this really highlights the benfits of restructuring the data and using `stcox`

in terms of computational time. Unless there is need to recalculate the probability of censoring weights, there is no need to do this every time you fit a model. Thus, in this case an `stcrreg`

model takes almost 35 minutes, whilst the same model using `stcox`

after using `stcrprep`

takes only 46 seconds.

It is worthwhile noting that Stata’s implementation of Fine and Grays proportional subhazards model using `stcrreg`

seems particularly slow. If I fit the model in R using `crr`

the model fitted to the expanded data it only takes 370 seconds compared to 2066 in Stata.

There are other benefits with using `stcox`

to fit the subhazards model, mainly because we can now use many of the other commands and extensions associated with `stcox`

. I will discuss these in other tutorials.