# Standardization

## Using standsurv

I will not go into details of standsurv here. There are various tutorials here.

There main advantage of using stpm3 rather than stpm2 with standsurv is the support for factor variables. I will ilustrate this with an example.

The code below loads the example colon cancer data set.

. use "https://pclambert.net/data/colon.dta", clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. stset surv_mm,f(status=1,2) id(id) scale(12) exit(time 120.5)

Survival-time data settings

ID variable: id
Failure event: status==1 2
Observed time interval: (surv_mm[_n-1], surv_mm]
Exit on or before: time 120.5
Time for analysis: time/12

--------------------------------------------------------------------------
15,564  total observations
0  exclusions
--------------------------------------------------------------------------
15,564  observations remaining, representing
15,564  subjects
10,459  failures in single-failure-per-subject data
51,685.667  total analysis time at risk and under observation
At risk from t =         0
Earliest observed entry t =         0
Last observed exit t =  10.04167

. gen male = sex==1



To illust rate the prediction options I will fit a model with an interaction between age and sex for both main and time-dependent effects, where the effect of age is modelled using natural splines.

. stpm3 i.male##@ns(age,df(3)),                 ///
>       tvc(i.male##@ns(age,df(3))) dftvc(2)    ///
>       df(5) scale(lncumhazard)

Iteration 0:   log likelihood =   -26505.4
Iteration 1:   log likelihood = -26040.023
Iteration 2:   log likelihood = -26005.258
Iteration 3:   log likelihood = -26004.878
Iteration 4:   log likelihood = -26004.877

Number of obs =  15,564
Wald chi2(7)  = 1029.65
Log likelihood = -26004.877                            Prob > chi2   =  0.0000

-----------------------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
xb                            |
1.male |   -.241306   .2648958    -0.91   0.362    -.7604922    .2778802
_ns_f1_age1 |  -6.206268   .9371473    -6.62   0.000    -8.043043   -4.369493
_ns_f1_age2 |  -.4202363   .4447267    -0.94   0.345    -1.291885    .4514121
_ns_f1_age3 |  -1.728845   .2531885    -6.83   0.000    -2.225086   -1.232605
|
male#c._ns_f1_age1 |
1  |    .684687   1.445988     0.47   0.636    -2.149398    3.518772
|
male#c._ns_f1_age2 |
1  |   .2789823   .6314241     0.44   0.659    -.9585862    1.516551
|
male#c._ns_f1_age3 |
1  |   .6956899   .4554634     1.53   0.127     -.197002    1.588382
------------------------------+----------------------------------------------------------------
time                          |
_ns1 |  -10.15155   .8167647   -12.43   0.000    -11.75238   -8.550724
_ns2 |   2.443526   .2234638    10.93   0.000     2.005545    2.881507
_ns3 |  -1.284021   .1792894    -7.16   0.000    -1.635422     -.93262
_ns4 |   -.836973   .1507723    -5.55   0.000    -1.132481   -.5414646
_ns5 |  -.6772498   .1283478    -5.28   0.000    -.9288069   -.4256928
|
male#c._ns_tvc1 |
1  |    .408095   .7632089     0.53   0.593    -1.087767    1.903957
|
male#c._ns_tvc2 |
1  |   1.358495   .6838178     1.99   0.047     .0182364    2.698753
|
c._ns_f1_age1#c._ns_tvc1 |  -19.04826   6.611939    -2.88   0.004    -32.00742     -6.0891
|
c._ns_f1_age1#c._ns_tvc2 |   2.555648   2.733907     0.93   0.350    -2.802712    7.914008
|
c._ns_f1_age2#c._ns_tvc1 |   3.701191   3.555068     1.04   0.298    -3.266614      10.669
|
c._ns_f1_age2#c._ns_tvc2 |  -.0607427   1.341542    -0.05   0.964    -2.690116    2.568631
|
c._ns_f1_age3#c._ns_tvc1 |  -2.033875   .8956267    -2.27   0.023    -3.789271    -.278479
|
c._ns_f1_age3#c._ns_tvc2 |  -.9184183    .694715    -1.32   0.186    -2.280035    .4431981
|
male#c._ns_f1_age1#c._ns_tvc1 |
1  |   7.104182   8.667079     0.82   0.412    -9.882981    24.09134
|
male#c._ns_f1_age1#c._ns_tvc2 |
1  |   -2.53056   3.925283    -0.64   0.519    -10.22397    5.162853
|
male#c._ns_f1_age2#c._ns_tvc1 |
1  |  -4.308122   4.611764    -0.93   0.350    -13.34701     4.73077
|
male#c._ns_f1_age2#c._ns_tvc2 |
1  |  -1.032837   1.804608    -0.57   0.567    -4.569803    2.504129
|
male#c._ns_f1_age3#c._ns_tvc1 |
1  |   .0752164   1.472709     0.05   0.959     -2.81124    2.961673
|
male#c._ns_f1_age3#c._ns_tvc2 |
1  |  -2.959469   1.198583    -2.47   0.014    -5.308649   -.6102891
|
_cons |   1.835956   .1428585    12.85   0.000     1.555958    2.115953
-----------------------------------------------------------------------------------------------
Extended functions
(1) @ns(age, df(3))
Warning: This is a test version of stpm3



To obtain the marginal survival for males and females which is standardized over the combined covariate distribution (just age in this case) we can use standsurv.

. range tt 0 10 101
(15,463 missing values generated)

. standsurv, at1(male 1) at2(male 0)    ///
>            atvar(Sm Sf)               ///
>            survival ci                ///
>            timevar(tt)                ///
>            contrast(difference)       ///
>            contrastvar(Sdiff)



If this was an stpm2 model then the spline variables would need to be calculated and then the interactions with age formed and this information passed to standsurv. To demonstrate the advantages of using stpm3 with factor variables and extended functions I will now fit the same model without using them.

. gensplines age, gen(agens) df(3) type(ns)

. forvalues i = 1/3 {
2.   gen m_agensi' = agensi' * male
3. }

. stpm3 male agens1 agens2 agens3 m_agens1 m_agens2 m_agens3,                 ///
>       tvc(male agens1 agens2 agens3 m_agens1 m_agens2 m_agens3) dftvc(2)    ///
>       df(5) scale(lncumhazard)

Iteration 0:   log likelihood =   -26505.4
Iteration 1:   log likelihood = -26040.023
Iteration 2:   log likelihood = -26005.258
Iteration 3:   log likelihood = -26004.878
Iteration 4:   log likelihood = -26004.877

Number of obs =  15,564
Wald chi2(7)  = 1029.65
Log likelihood = -26004.877                            Prob > chi2   =  0.0000

---------------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
----------------------+----------------------------------------------------------------
xb                    |
male |   -.241306   .2648958    -0.91   0.362    -.7604922    .2778802
agens1 |  -6.206268   .9371473    -6.62   0.000    -8.043043   -4.369493
agens2 |  -.4202364   .4447267    -0.94   0.345    -1.291885     .451412
agens3 |  -1.728845   .2531885    -6.83   0.000    -2.225086   -1.232605
m_agens1 |    .684687   1.445988     0.47   0.636    -2.149398    3.518772
m_agens2 |   .2789823   .6314241     0.44   0.659    -.9585861    1.516551
m_agens3 |     .69569   .4554634     1.53   0.127     -.197002    1.588382
----------------------+----------------------------------------------------------------
time                  |
_ns1 |  -10.15155   .8167647   -12.43   0.000    -11.75238   -8.550724
_ns2 |   2.443526   .2234638    10.93   0.000     2.005545    2.881507
_ns3 |  -1.284021   .1792894    -7.16   0.000    -1.635422     -.93262
_ns4 |   -.836973   .1507723    -5.55   0.000    -1.132481   -.5414646
_ns5 |  -.6772499   .1283478    -5.28   0.000    -.9288069   -.4256928
|
c.male#c._ns_tvc1 |   .4080952   .7632089     0.53   0.593    -1.087767    1.903957
|
c.male#c._ns_tvc2 |   1.358495   .6838178     1.99   0.047     .0182364    2.698753
|
c.agens1#c._ns_tvc1 |  -19.04826   6.611939    -2.88   0.004    -32.00742   -6.089098
|
c.agens1#c._ns_tvc2 |   2.555648   2.733907     0.93   0.350    -2.802712    7.914008
|
c.agens2#c._ns_tvc1 |    3.70119   3.555068     1.04   0.298    -3.266615    10.66899
|
c.agens2#c._ns_tvc2 |  -.0607429   1.341542    -0.05   0.964    -2.690116    2.568631
|
c.agens3#c._ns_tvc1 |  -2.033875   .8956267    -2.27   0.023    -3.789271   -.2784789
|
c.agens3#c._ns_tvc2 |  -.9184183    .694715    -1.32   0.186    -2.280035    .4431981
|
c.m_agens1#c._ns_tvc1 |   7.104179   8.667079     0.82   0.412    -9.882983    24.09134
|
c.m_agens1#c._ns_tvc2 |   -2.53056   3.925283    -0.64   0.519    -10.22397    5.162853
|
c.m_agens2#c._ns_tvc1 |  -4.308121   4.611764    -0.93   0.350    -13.34701    4.730771
|
c.m_agens2#c._ns_tvc2 |  -1.032837   1.804608    -0.57   0.567    -4.569803    2.504129
|
c.m_agens3#c._ns_tvc1 |    .075216   1.472709     0.05   0.959    -2.811241    2.961673
|
c.m_agens3#c._ns_tvc2 |  -2.959469   1.198583    -2.47   0.014    -5.308649   -.6102891
|
_cons |   1.835956   .1428585    12.85   0.000     1.555958    2.115953
---------------------------------------------------------------------------------------
Warning: This is a test version of stpm3



It is necessary to incorporate the interactions into the standsurv call.

. standsurv, at1(male 1 m_agens1 = agens1 m_agens2 = agens2 m_agens3 = agens3) ///
>            at2(male 0 m_agens1 0 m_agens2 0 m_agens3 0)                      ///
>            atvar(Sm2 Sf2)                                                    ///
>            survival ci                                                       ///
>            timevar(tt)                                                       ///
>            contrast(difference)                                              ///
>            contrastvar(Sdiff2)



The standardized estimates are identical, but using factor variables combined with extended functions makes life much easier.

. list Sm Sm2 Sf Sf2 Sdiff Sdiff2 in 1/21

+-----------------------------------------------------------------------+
|        Sm         Sm2          Sf         Sf2       Sdiff      Sdiff2 |
|-----------------------------------------------------------------------|
1. |         1           1           1           1           0           0 |
2. | .92529545   .92529545   .94122671   .94122671   .01593126   .01593126 |
3. | .84974545   .84974545   .87134443   .87134443   .02159898   .02159898 |
4. | .80156903   .80156903   .82310036   .82310036   .02153133   .02153133 |
5. | .76810442   .76810442   .78841691   .78841691   .02031248   .02031248 |
|-----------------------------------------------------------------------|
6. |  .7411819    .7411819   .76034214   .76034214   .01916024   .01916024 |
7. | .71729707   .71729707   .73561503   .73561503   .01831796   .01831796 |
8. | .69488063   .69488063   .71268414   .71268414   .01780351   .01780351 |
9. | .67372376   .67372376   .69128997   .69128997    .0175662    .0175662 |
10. | .65393981   .65393981   .67149717   .67149717   .01755737   .01755737 |
|-----------------------------------------------------------------------|
11. | .63554187   .63554187   .65327779   .65327779   .01773593   .01773593 |
12. | .61848448   .61848448    .6365447    .6365447   .01806021   .01806021 |
13. | .60270023   .60270023   .62119776   .62119776   .01849753   .01849753 |
14. | .58811288   .58811288   .60713539   .60713539   .01902251   .01902251 |
15. | .57464343   .57464343   .59425861   .59425861   .01961518   .01961518 |
|-----------------------------------------------------------------------|
16. | .56218753   .56218753   .58244777   .58244777   .02026024   .02026024 |
17. | .55060507   .55060507    .5715515    .5715515   .02094643   .02094643 |
18. | .53977606   .53977606   .56144057   .56144057   .02166451   .02166451 |
19. | .52960184   .52960184   .55200868   .55200868   .02240684   .02240684 |
20. | .52000052   .52000052   .54316761   .54316761   .02316709   .02316709 |
|-----------------------------------------------------------------------|
21. | .51090369   .51090369   .53484371   .53484371   .02394001   .02394001 |
+-----------------------------------------------------------------------+



The marginal estimates can be plotted

. line Sm Sf tt, xtitle("Time since diagnosis")      ///
>                ytitle(S(t))                        ///
>                legend(order(1 "Males" 2 "Females") ///
>                       ring(0) pos(1) cols(1))      ///
>                name(Marginal, replace)

. twoway (rarea Sdiff_lci Sdiff_uci tt, color(red%30)) ///
>          (line Sdiff tt, color(red)),                ///
>          xtitle("Time since diagnosis")              ///
>          ytitle(Difference in marginal survival)     ///
>          legend(off)                                 ///
>          name(Marginal_diff, replace)