Extended Functions

Include non-linear function in a varlist

A new addition to stpm3 is the use of extended functions within a model varlist. This allows you to specify various spline, polynomials, fractional polynomials functions directly when specifying the model. This makes fitting the model slightly easier, but the main advantage is the ease at which you can get predictions from complex models.

We first load the Rotterdam 2 breast cancer data and then use stset to declare the survival time and event indicator.

. use https://www.pclambert.net/data/rott2b, clear
(Rotterdam breast cancer data (augmented with cause of death))

. stset os, f(osi==1) scale(12) exit(time 60)

Survival-time data settings

Failure event: osi==1
Observed time interval: (0, os]
Exit on or before: time 60
Time for analysis: time/12

--------------------------------------------------------------------------
2,982  total observations
0  exclusions
--------------------------------------------------------------------------
2,982  observations remaining, representing
753  failures in single-record/single-failure data
13,038.968  total analysis time at risk and under observation
At risk from t =         0
Earliest observed entry t =         0
Last observed exit t =         5



If you want to include a non-linear effect for a covariate in a model, the usual approach would be to generate some new derived variables and then include these in the model. For example, generate a quadratic and cubic term or generate spline basis functions using mkspline or rcsgen.

The code below shows how to include non-linear effects using restricted cubic splines with rcsgen with 4 knots, which equates to 3 spline variables.

. rcsgen age, gen(agercs) df(3)
Variables agercs1 to agercs3 were created

. global ageknots r(knots)'

. stpm3 i.hormon agercs1-agercs3, scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2120.3313
Iteration 1:   log likelihood = -2120.0705
Iteration 2:   log likelihood = -2120.0694
Iteration 3:   log likelihood = -2120.0694

Number of obs =  2,982
Wald chi2(4)  =  69.03
Log likelihood = -2120.0694                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
hormon |
yes  |   .3966145   .1041656     3.81   0.000     .1924536    .6007754
agercs1 |  -.0430389   .0160545    -2.68   0.007    -.0745051   -.0115727
agercs2 |  -.0000314    .000038    -0.82   0.410    -.0001059    .0000432
agercs3 |  -1.70e-07   .0000357    -0.00   0.996    -.0000701    .0000697
-------------+----------------------------------------------------------------
time         |
_ns1 |  -22.10362   2.190797   -10.09   0.000    -26.39751   -17.80974
_ns2 |   6.252487    1.18838     5.26   0.000     3.923304     8.58167
_ns3 |  -1.093169   .0583025   -18.75   0.000    -1.207439   -.9788977
_ns4 |  -.5425182   .0376756   -14.40   0.000    -.6163609   -.4686754
_ns5 |   -.376672   .0361107   -10.43   0.000    -.4474477   -.3058964
_cons |   .3065984    .606017     0.51   0.613    -.8811731     1.49437
------------------------------------------------------------------------------

. estimates store stpm3_rcsgen



I have stored the knot locations as these will be needed for certain predictions.

The same model can be fitted by using @rcs(age, df(3)) within an stpm3 varlist.

. stpm3 i.hormon @rcs(age, df(3)), scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2120.3313
Iteration 1:   log likelihood = -2120.0705
Iteration 2:   log likelihood = -2120.0694
Iteration 3:   log likelihood = -2120.0694

Number of obs =  2,982
Wald chi2(4)  =  69.03
Log likelihood = -2120.0694                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
hormon |
yes  |   .3966145   .1041656     3.81   0.000     .1924536    .6007754
_rcs_f1_age1 |  -.0430389   .0160545    -2.68   0.007    -.0745051   -.0115727
_rcs_f1_age2 |  -.0000314    .000038    -0.82   0.410    -.0001059    .0000432
_rcs_f1_age3 |  -1.70e-07   .0000357    -0.00   0.996    -.0000701    .0000697
-------------+----------------------------------------------------------------
time         |
_ns1 |  -22.10362   2.190797   -10.09   0.000    -26.39751   -17.80974
_ns2 |   6.252487    1.18838     5.26   0.000     3.923304     8.58167
_ns3 |  -1.093169   .0583025   -18.75   0.000    -1.207439   -.9788977
_ns4 |  -.5425182   .0376756   -14.40   0.000    -.6163609   -.4686754
_ns5 |   -.376672   .0361107   -10.43   0.000    -.4474477   -.3058964
_cons |   .3065984    .606017     0.51   0.613    -.8811731     1.49437
------------------------------------------------------------------------------
Extended functions
(1) @rcs(age, df(3))



If you compare the model coefficients and log-likelhoods you will see they are identical.

From here on I will use natural splines - these will give the same predicted values as when using restricted cubic splines, but have some useful additional properties.

. stpm3 i.hormon @ns(age, df(3)), scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2120.3313
Iteration 1:   log likelihood = -2120.0705
Iteration 2:   log likelihood = -2120.0694
Iteration 3:   log likelihood = -2120.0694

Number of obs =  2,982
Wald chi2(4)  =  69.03
Log likelihood = -2120.0694                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
hormon |
yes  |   .3966145   .1041656     3.81   0.000     .1924536    .6007754
_ns_f1_age1 |  -1.769774   1.089136    -1.62   0.104    -3.904441    .3648924
_ns_f1_age2 |  -1.233782   .4602279    -2.68   0.007    -2.135812   -.3317519
_ns_f1_age3 |  -1.709216   .4899396    -3.49   0.000     -2.66948   -.7489521
-------------+----------------------------------------------------------------
time         |
_ns1 |  -22.10362   2.190797   -10.09   0.000    -26.39751   -17.80974
_ns2 |   6.252487    1.18838     5.26   0.000     3.923304     8.58167
_ns3 |  -1.093169   .0583025   -18.75   0.000    -1.207439   -.9788977
_ns4 |  -.5425182   .0376756   -14.40   0.000    -.6163609   -.4686754
_ns5 |   -.376672   .0361107   -10.43   0.000    -.4474477   -.3058964
_cons |  -.1364105   .2354592    -0.58   0.562     -.597902     .325081
------------------------------------------------------------------------------
Extended functions
(1) @ns(age, df(3))

. estimates store stpm3_ns



The coefficients for the spline variables are now different, but if you predict for a specified covariate pattern, the predictions will be identical.

Predictions with extended functions

If we generated the spline variables for age at diagnosis and then included these in the model, we would have to work out the values of the spline variables at the age at diagnosis of interest. For example, using the scalar option of rcsgen to predict survival for a 70 year old on hormonal treatment the code below can be used.

. estimates restore stpm3_rcsgen
(results stpm3_rcsgen are active now)

. rcsgen, scalar(70) gen(c) knots(\${ageknots})
Scalars c1 to c3 were created

. predict S70a, survival ci                                             ///
>               at1(agercs1 =c1' agercs2 =c2' agercs3 =c3' hormon 1) ///
>               timevalues(0 5, step(0.1))                             ///
>               frame(f1)
Predictions are stored in frame - f1



By storing the knots and passing these to rcsgen combined with the scalar option, the values of the restricted cubic spline variables for a 70 year old are obtained. These can then be passed to the predict command.

If you use an extended function to fit an equivalent model this simplifies the predictions. For example predicting for a 70 year old woman on hormonal treatment we can use the following predict command.

. estimates restore stpm3_ns
(results stpm3_ns are active now)

. predict S70b, survival ci          ///
>               at1(age 70 hormon 1) ///
>               frame(f1, merge)
Predictions are stored in frame - f1



These predictions have been merged into frame f1, so we can compare the predictions.

. frame f1: list in 1/21

+-----------------------------------------------------------------------------+
|  tt        S70a    S70a_lci    S70a_uci        S70b    S70b_lci    S70b_uci |
|-----------------------------------------------------------------------------|
1. |   0           1           1           1           1           1           1 |
2. |  .1   .99980531   .99901994   .99996134   .99980531   .99901994   .99996134 |
3. |  .2    .9990375   .99724993   .99966333    .9990375   .99724993   .99966333 |
4. |  .3   .99757461   .99484041   .99886071   .99757461   .99484041   .99886071 |
5. |  .4   .99537568   .99175595   .99740816   .99537568   .99175595   .99740816 |
|-----------------------------------------------------------------------------|
6. |  .5   .99243419   .98793852   .99525821   .99243419   .98793852   .99525821 |
7. |  .6   .98876195   .98333349   .99242911   .98876195   .98333349   .99242911 |
8. |  .7   .98438145   .97790805    .9889687   .98438145   .97790805    .9889687 |
9. |  .8   .97932165   .97166223   .98492686   .97932165   .97166223   .98492686 |
10. |  .9   .97361549   .96462911   .98034204   .97361549   .96462911   .98034204 |
|-----------------------------------------------------------------------------|
11. |   1   .96729824   .95686644   .97523978   .96729824   .95686644   .97523978 |
12. | 1.1   .96040644   .94844522     .969637   .96040644   .94844522     .969637 |
13. | 1.2   .95297712   .93943933   .96354746   .95297712   .93943933   .96354746 |
14. | 1.3   .94504732   .92991782   .95698631   .94504732   .92991782   .95698631 |
15. | 1.4   .93665358   .91993953    .9499734   .93665358   .91993953    .9499734 |
|-----------------------------------------------------------------------------|
16. | 1.5   .92783183   .90954967   .94253584   .92783183   .90954967   .94253584 |
17. | 1.6   .91861703   .89877754   .93470981   .91861703   .89877754   .93470981 |
18. | 1.7   .90904198   .88763238   .92654197   .90904198   .88763238   .92654197 |
19. | 1.8   .89913147   .87609834   .91808407   .89913147   .87609834   .91808407 |
20. | 1.9   .88890629   .86417228   .90937397   .88890629   .86417228   .90937397 |
|-----------------------------------------------------------------------------|
21. |   2   .87838596   .85187732   .90042978   .87838596   .85187732   .90042978 |
+-----------------------------------------------------------------------------+



The predictions are identical, but obtaining the predictions using extended functions was much easier. If the model includes interactions then I would say it is much, much easier and less prone to coding errors.

The current extended functions are

@bs() B-splines
@fp() fractional polynomials
@ns() natural cubic splines
@poly() polynomials
@rcs() restricted cubic splines

See the help file for options for each of these functions. For example, with the spline functions you specify knots positions or the degree for B-splines. For fractional polynomials there are scale and center options. For all extended functions, except fractional polynomials, it is possible to perform winsorising before the derived variables are generated using the winsor option.

Predictions in more complex models

If the model get complex with numerous interactions and non-linear effects and interactions with time, the predictions remain simple.

Consider the model below with main effects and an interaction between hormon and a natural spline function for age. stpm3 stores the details of the extended function and so when predictions are made the appropriate values of the derived variables can be obtained.

In the code below, I obtain predictions for 70 year old women and without hormonal treatment.

. stpm3 i.hormon##@ns(age, df(3)), scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2117.3748
Iteration 1:   log likelihood = -2117.1157
Iteration 2:   log likelihood = -2117.1148
Iteration 3:   log likelihood = -2117.1148

Number of obs =  2,982
Wald chi2(7)  =  77.72
Log likelihood = -2117.1148                             Prob > chi2   = 0.0000

--------------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
xb                   |
hormon |
yes  |   1.295954   .5457098     2.37   0.018      .226382    2.365525
_ns_f1_age1 |  -1.006332    1.17959    -0.85   0.394    -3.318285    1.305621
_ns_f1_age2 |  -1.519938    .473222    -3.21   0.001    -2.447436   -.5924397
_ns_f1_age3 |  -1.322119   .5553445    -2.38   0.017    -2.410574   -.2336639
|
hormon#c._ns_f1_age1 |
yes  |  -6.312222    4.38403    -1.44   0.150    -14.90476    2.280319
|
hormon#c._ns_f1_age2 |
yes  |   3.192456   2.152641     1.48   0.138    -1.026642    7.411555
|
hormon#c._ns_f1_age3 |
yes  |  -2.436599   1.186322    -2.05   0.040    -4.761746   -.1114511
---------------------+----------------------------------------------------------------
time                 |
_ns1 |  -22.10808   2.190478   -10.09   0.000    -26.40134   -17.81483
_ns2 |   6.252367   1.188247     5.26   0.000     3.923447    8.581288
_ns3 |  -1.094601   .0583504   -18.76   0.000    -1.208966   -.9802364
_ns4 |  -.5429931   .0377085   -14.40   0.000    -.6169004   -.4690859
_ns5 |  -.3770147   .0361414   -10.43   0.000    -.4478506   -.3061788
_cons |  -.2889299   .2699311    -1.07   0.284    -.8179851    .2401253
--------------------------------------------------------------------------------------
Extended functions
(1) @ns(age, df(3))

.
. predict S70h0 S70h1, survival ci                  ///
>                      at1(age 70 hormon 0)         ///
>                      at2(age 70 hormon 1)         ///
>                      timevalues(0 5, step(0.1))  ///
>                      frame(f2)
Predictions are stored in frame - f2



The predict command would be identical if there was no interaction.

Even if there are time-dependent effects, the predict statement does not change. The model below adds an interaction for the time-dependent effects for hormon and age.

. stpm3 i.hormon##@ns(age, df(3)),                 ///
>       tvc(i.hormon##@ns(age, df(2))) dftvc(2)    ///
>       scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2116.6285
Iteration 1:   log likelihood = -2107.5574
Iteration 2:   log likelihood = -2103.0052
Iteration 3:   log likelihood = -2102.7883
Iteration 4:   log likelihood = -2102.7869
Iteration 5:   log likelihood = -2102.7869

Number of obs =  2,982
Wald chi2(7)  =  74.61
Log likelihood = -2102.7869                             Prob > chi2   = 0.0000

-------------------------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
xb                              |
hormon |
yes  |     1.2372    .559703     2.21   0.027     .1402021    2.334198
_ns_f1_age1 |  -.9347474    1.18498    -0.79   0.430    -3.257267    1.387772
_ns_f1_age2 |  -1.538365   .4738744    -3.25   0.001    -2.467142   -.6095881
_ns_f1_age3 |  -1.308583   .5588214    -2.34   0.019    -2.403853   -.2133128
|
hormon#c._ns_f1_age1 |
yes  |  -5.991194   4.385156    -1.37   0.172    -14.58594    2.603554
|
hormon#c._ns_f1_age2 |
yes  |   3.188885   2.145601     1.49   0.137    -1.016415    7.394185
|
hormon#c._ns_f1_age3 |
yes  |  -2.318151   1.205093    -1.92   0.054    -4.680089    .0437866
--------------------------------+----------------------------------------------------------------
time                            |
_ns1 |  -19.18669   7.212764    -2.66   0.008    -33.32345   -5.049932
_ns2 |   5.607982   3.176181     1.77   0.077    -.6172182    11.83318
_ns3 |   -1.05599   .2202421    -4.79   0.000    -1.487656    -.624323
_ns4 |  -.5553481   .1523431    -3.65   0.000    -.8539351   -.2567611
_ns5 |  -.3960238   .1116303    -3.55   0.000    -.6148151   -.1772324
|
hormon#c._ns_tvc1 |
yes  |   4.903816   4.856667     1.01   0.313    -4.615075    14.42271
|
hormon#c._ns_tvc2 |
yes  |   .0799753   1.642765     0.05   0.961    -3.139786    3.299736
|
c._ns_f2_age1#c._ns_tvc1 |  -29.15335   8.875221    -3.28   0.001    -46.54847   -11.75824
|
c._ns_f2_age1#c._ns_tvc2 |   .6654049   1.325155     0.50   0.616     -1.93185     3.26266
|
c._ns_f2_age2#c._ns_tvc1 |  -.8954604   11.34532    -0.08   0.937    -23.13188    21.34096
|
c._ns_f2_age2#c._ns_tvc2 |   .3600924   2.127002     0.17   0.866    -3.808755     4.52894
|
hormon#c._ns_f2_age1#c._ns_tvc1 |
yes  |  -117.1348   72.85753    -1.61   0.108    -259.9329    25.66332
|
hormon#c._ns_f2_age1#c._ns_tvc2 |
yes  |  -.4836296   6.116531    -0.08   0.937    -12.47181    11.50455
|
hormon#c._ns_f2_age2#c._ns_tvc1 |
yes  |  -5.302671   20.04585    -0.26   0.791    -44.59182    33.98648
|
hormon#c._ns_f2_age2#c._ns_tvc2 |
yes  |  -1.511553   5.134032    -0.29   0.768    -11.57407    8.550965
|
_cons |  -.3015535   .2724136    -1.11   0.268    -.8354743    .2323674
-------------------------------------------------------------------------------------------------
Extended functions
(1) @ns(age, df(3))
(2) @ns(age, df(2))

.
. predict S70h0 S70h1, survival ci                ///
>                      at1(age 70 hormon 0)       ///
>                      at2(age 70 hormon 1)       ///
>                      timevalues(0 5, step(0.1)) ///
>                      frame(f3)
Predictions are stored in frame - f3



The key point is that the predict command stays simple. Note that I use a different function for the time-dependent effect.

Finally I will fit a model with a spline function that winsorises the variables before the spline basis functions are generated. This can improve model stability in the tails in complex models.

. stpm3 i.hormon##@bs(age, df(3) degree(2) winsor(2 98)), ///
>       scale(lncumhazard) df(5)

Iteration 0:   log likelihood = -2117.8274
Iteration 1:   log likelihood = -2117.5679
Iteration 2:   log likelihood = -2117.5669
Iteration 3:   log likelihood = -2117.5669

Number of obs =  2,982
Wald chi2(7)  =  75.27
Log likelihood = -2117.5669                             Prob > chi2   = 0.0000

--------------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
xb                   |
hormon |
yes  |  -.2013625   .8479376    -0.24   0.812     -1.86329    1.460565
_bs_f1_age1 |   -.789629   .2399149    -3.29   0.001    -1.259854   -.3194044
_bs_f1_age2 |  -.3251852   .1782746    -1.82   0.068     -.674597    .0242265
_bs_f1_age3 |   .2358417   .2106851     1.12   0.263    -.1770935     .648777
|
hormon#c._bs_f1_age1 |
yes  |    1.71135   1.096385     1.56   0.119    -.4375242    3.860225
|
hormon#c._bs_f1_age2 |
yes  |  -.2149046   .8373336    -0.26   0.797    -1.856048    1.426239
|
hormon#c._bs_f1_age3 |
yes  |   .9610499   .9192863     1.05   0.296    -.8407181    2.762818
---------------------+----------------------------------------------------------------
time                 |
_ns1 |  -22.10814   2.191013   -10.09   0.000    -26.40245   -17.81384
_ns2 |   6.253144   1.188507     5.26   0.000     3.923714    8.582574
_ns3 |  -1.093972   .0583289   -18.76   0.000    -1.208295   -.9796495
_ns4 |  -.5428561   .0376962   -14.40   0.000    -.6167393   -.4689729
_ns5 |  -.3769876   .0361375   -10.43   0.000    -.4478158   -.3061593
_cons |  -.9075211   .1468767    -6.18   0.000    -1.195394   -.6196481
--------------------------------------------------------------------------------------
Extended functions
(1) @bs(age, df(3) degree(2) winsor(2 98))

.
. predict S70h0 S70h1, survival ci                 ///
>                      at1(age 70 hormon 0)        ///
>                      at2(age 70 hormon 1)        ///
>                      timevalues(0 5, step(0.1)) ///
>                      frame(f4)
Predictions are stored in frame - f4



The winsor(2 98) option replaces values less than the 2nd centile with the value at the 2nd centile and values greater than the 98th centile with the values at the 98th centile. There is a values suboption where you can directly specifying values for the cutoffs rather than centiles.