Factor Variables
stpm3
fully supports factor variables
In stpm3
there is now full support for factor variables including for time-dependent effects. This makes predictions much easier. In addition, standsurv
has been updated to be compatible with stpm3
making marginal predictions also much easier.
We first load the Rotterdam breast cancer data and then use stset
to declare the survival time and event indicator.
use https://www.pclambert.net/data/rott3, clear
. data (augmented with cause of death))
(Rotterdam breast cancer
stset os, f(osi==1) scale(12) exit(time 120)
.
data settings
Survival-time
Failure event: osi==1
Observed time interval: (0, os]on or before: time 120
Exit for analysis: time/12
Time
--------------------------------------------------------------------------total observations
2,982
0 exclusions
--------------------------------------------------------------------------
2,982 observations remaining, representingin single-record/single-failure data
1,171 failures total analysis time at risk and under observation
20,002.424
At risk from t = 0
Earliest observed entry t = 0exit t = 10 Last observed
The scale(12)
option converts the times recorded in months to years.
To fit an stpm3
model with a binary covariate we could use,
scale(lncumhazard) df(5)
. stpm3 hormon,
Iteration 0: Log likelihood = -2929.2995
Iteration 1: Log likelihood = -2928.2998
Iteration 2: Log likelihood = -2928.2966
Iteration 3: Log likelihood = -2928.2966
of obs = 2,982
Number chi2(1) = 25.19
Wald chi2 = 0.0000
Log likelihood = -2928.2966 Prob >
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------xb |
hormon | .4321954 .0861189 5.02 0.000 .2634054 .6009854
-------------+----------------------------------------------------------------
time |
_ns1 | -23.9834 1.92113 -12.48 0.000 -27.74874 -20.21805
_ns2 | 6.695765 1.027919 6.51 0.000 4.681082 8.710449
_ns3 | -1.214676 .0497438 -24.42 0.000 -1.312172 -1.11718
_ns4 | -.8095755 .0387379 -20.90 0.000 -.8855004 -.7336505
_ns5 | -.4994385 .0418591 -11.93 0.000 -.5814808 -.4173963_cons | -.5713643 .0332128 -17.20 0.000 -.6364603 -.5062684
------------------------------------------------------------------------------
The equivalent model using factor variables is,
scale(lncumhazard) df(5)
. stpm3 i.hormon,
Iteration 0: Log likelihood = -2929.2995
Iteration 1: Log likelihood = -2928.2998
Iteration 2: Log likelihood = -2928.2966
Iteration 3: Log likelihood = -2928.2966
of obs = 2,982
Number chi2(1) = 25.19
Wald chi2 = 0.0000
Log likelihood = -2928.2966 Prob >
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------xb |
hormon |
yes | .4321954 .0861189 5.02 0.000 .2634054 .6009854
-------------+----------------------------------------------------------------
time |
_ns1 | -23.9834 1.92113 -12.48 0.000 -27.74874 -20.21805
_ns2 | 6.695765 1.027919 6.51 0.000 4.681082 8.710449
_ns3 | -1.214676 .0497438 -24.42 0.000 -1.312172 -1.11718
_ns4 | -.8095755 .0387379 -20.90 0.000 -.8855004 -.7336505
_ns5 | -.4994385 .0418591 -11.93 0.000 -.5814808 -.4173963_cons | -.5713643 .0332128 -17.20 0.000 -.6364603 -.5062684
------------------------------------------------------------------------------
You can also include factor variables as a time-dependent effect.
scale(lncumhazard) df(5) ///
. stpm3 i.hormon,
> tvc(i.hormon) dftvc(3)
Iteration 0: Log likelihood = -2928.8322
Iteration 1: Log likelihood = -2926.8607
Iteration 2: Log likelihood = -2926.8409
Iteration 3: Log likelihood = -2926.8409
of obs = 2,982
Number chi2(1) = 20.88
Wald chi2 = 0.0000
Log likelihood = -2926.8409 Prob >
-----------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
------------------+----------------------------------------------------------------xb |
hormon |
yes | .4766542 .1043107 4.57 0.000 .2722089 .6810995
------------------+----------------------------------------------------------------
time |
_ns1 | -25.11479 2.156221 -11.65 0.000 -29.3409 -20.88867
_ns2 | 7.297905 1.13577 6.43 0.000 5.071836 9.523974
_ns3 | -1.195295 .0516705 -23.13 0.000 -1.296567 -1.094023
_ns4 | -.7997134 .0402802 -19.85 0.000 -.8786612 -.7207657
_ns5 | -.4954146 .0427173 -11.60 0.000 -.5791388 -.4116903
|
hormon#c._ns_tvc1 |
yes | 4.943174 3.778547 1.31 0.191 -2.462642 12.34899
|
hormon#c._ns_tvc2 |
yes | -2.952286 1.995084 -1.48 0.139 -6.862579 .9580062
|
hormon#c._ns_tvc3 |
yes | -.0773513 .1778044 -0.44 0.664 -.4258415 .2711389
|_cons | -.5740705 .0333738 -17.20 0.000 -.6394819 -.5086591
-----------------------------------------------------------------------------------
You can incorporate interactions into both the main effect and interactions with time using tvc()
.
scale(lncumhazard) df(5) ///
. stpm3 i.hormon##i.grade,
> tvc(i.hormon##i.grade) dftvc(3) baselevels
Iteration 0: Log likelihood = -2901.3265
Iteration 1: Log likelihood = -2895.1769
Iteration 2: Log likelihood = -2895.0066
Iteration 3: Log likelihood = -2895.0059
Iteration 4: Log likelihood = -2895.0059
of obs = 2,982
Number chi2(3) = 60.61
Wald chi2 = 0.0000
Log likelihood = -2895.0059 Prob >
-----------------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
------------------------+----------------------------------------------------------------xb |
hormon |
no | 0 (base)
yes | .4135919 .2951852 1.40 0.161 -.1649605 .9921443
|
grade |
2 | 0 (base)
3 | .483797 .0805037 6.01 0.000 .3260126 .6415814
|
hormon#grade |
yes#3 | .0284316 .315761 0.09 0.928 -.5904485 .6473117
------------------------+----------------------------------------------------------------
time |
_ns1 | -27.62422 5.730659 -4.82 0.000 -38.85611 -16.39234
_ns2 | 7.583979 2.721378 2.79 0.005 2.250176 12.91778
_ns3 | -1.385614 .1121938 -12.35 0.000 -1.60551 -1.165718
_ns4 | -.8489422 .0796837 -10.65 0.000 -1.005119 -.6927651
_ns5 | -.4786468 .0735986 -6.50 0.000 -.6228974 -.3343963
|
hormon#c._ns_tvc1 |
yes | -2.771489 16.45089 -0.17 0.866 -35.01463 29.47165
|
hormon#c._ns_tvc2 |
yes | 2.425308 8.628728 0.28 0.779 -14.48669 19.3373
|
hormon#c._ns_tvc3 |
yes | -.7069334 .573263 -1.23 0.218 -1.830508 .4166415
|
grade#c._ns_tvc1 |
3 | 2.757887 5.847325 0.47 0.637 -8.702659 14.21843
|
grade#c._ns_tvc2 |
3 | -.285027 3.030253 -0.09 0.925 -6.224213 5.654159
|
grade#c._ns_tvc3 |
3 | -.0467859 .1288623 -0.36 0.717 -.2993512 .2057795
|
hormon#grade#c._ns_tvc1 |
yes#3 | 8.162556 16.89039 0.48 0.629 -24.942 41.26711
|
hormon#grade#c._ns_tvc2 |
yes#3 | -5.885373 8.860398 -0.66 0.507 -23.25143 11.48069
|
hormon#grade#c._ns_tvc3 |
yes#3 | .7162528 .6030515 1.19 0.235 -.4657065 1.898212
|_cons | -.9320134 .0710511 -13.12 0.000 -1.071271 -.7927559
-----------------------------------------------------------------------------------------
I strongly recommend using factor variables. When the model becomes complex, with interactions, time-dependent effects etc, then predictions usinff predict
or standsurv
become much simpler.