Sensitivity analysis to location of knots (proportional hazards)

Sensitivity analysis to the location of knots

When using stpm2 with the df() option the location of the knots for the restricted cubic splines are selected using the defaults. These are the based at the centiles of $\ln (t)$ for the events (i.e. the non censored observations). The boundary knots are placed at the minimum and maximum log event times. For example, with 5 knots there will be knots placed at the $0^{t h}$ , $25^{t h}$ , $50^{t h}$ , $75^{t h}$ , and $100^{t h}$ centiles of the log event times. The location of the internal knots can be changed using the knots() option and the location of the boundary knots can be changed using the bknots() option.

I was asked recently by Enzo Coviello why we use these knot locations and why not the knot locations suggested by Frank Harrell when using restricted cubic spines in his execellent book Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. The table below shows the knot locations suggested by Harrell and those we use in stpm2.

No. of knots	Percentiles (Harrell)	Percentiles (`stpm2`)
3	10 50 90	0 50 100
4	5 35 65 95	0 33 67 100
5	5 27.5 50 72.5 95	0 25 50 75 100
6	5 23 41 59 77 95	0 20 40 60 80 100
7	2.5 18.33 34.17 50 65.83 81.67 97.5	0 17 33 50 67 83 100

We have performed a number of sensitivity analysis to internal knot location, i.e. still keeping the boundary knots at the minimum and maximum log event times, and have found predicted hazard and survival functions to be very robust to these changes. However, we have not changed the boundary knots so much. The only time I can remember this is when fitting cure models (Andersson et al. 2011).

In my reply to Enzo I explained that we had motivated our choice of knots by the fact that it is better not to make linearity assumptions within the range of the data, but the linearity assumption outside the range of the data adds some stability to the function at the extremes. I also ran a very quick simulation study based on the same scenarios in Mark Rutherford’s simulation paper (Rutherford 2015 et al.). I extend that simulation study here.

I will simulate the same 4 scenarios as in Mark’s paper, but will not simulate any covariate effects as I am only really interested in how well the restricted cubic spline function performs. Each of the scenarios was simulated from a mixture Weibull distributon,

$S (t) = π \exp (- λ_{1} t^{γ_{1}}) + (1 - π) \exp (- λ_{2} t^{γ_{2}})$

The following parameters are used for each scenario,

Scenario	$λ_{1}$	$λ_{1}$	$γ_{1}$	$γ_{2}$	$π$
1	0.6	-	0.8	–	1
2	0.2	1.6	0.8	1.0	0.2
3	1	1	1.5	0.5	0.5
4	0.03	0.3	1.9	2.5	0.7

The true survival and the hazard functions can be plotted for each scenario. Below is a program I use to do this. I first declare some local macros to define the Weibull mixture parameters in each scenario. These will also be used when running the simulations.

. local scenario1 lambda1(0.6) lambda2(0.6) gamma1(0.8) gamma2(0.8) pi(1) maxt(5)

. local scenario2 lambda1(0.2) lambda2(1.6) gamma1(0.8) gamma2(1) pi(0.2) maxt(5)

. local scenario3 lambda1(1) lambda2(1) gamma1(1.5) gamma2(0.5) pi(0.5) maxt(5)

. local scenario4 lambda1(0.03) lambda2(0.3) gamma1(1.9) gamma2(2.5) pi(0.7) maxt(5)

I can then delclare and run the program to plot the true survival and hazard functions.

. capture pr drop weibmixplot

. program define weibmixplot
  1.   syntax [, OBS(integer 1000) lambda1(real 1) lambda2(real 1) ///
>       gamma1(real 1) gamma2(real 1) pi(real 0.5) maxt(real 5)  scenario(integer 1)]
  2.   local S1 exp(-`lambda1'*x^(`gamma1'))
  3.   local S2 exp(-`lambda2'*x^(`gamma2'))
  4.   local h1 `lambda1'*`gamma1'*x^(`gamma1' - 1)
  5.   local h2 `lambda2'*`gamma2'*x^(`gamma2' - 1)
  6.   
.   twoway function y = `pi'*`S1' + (1-`pi')*`S2' ///
>     , range(0 `maxt') name(s`scenario',replace) ///
>     xtitle("Time (years)") ///
>     ytitle("S(t)") ///
>     ylabel(,angle(h) format(%3.1f)) ///
>         title("Scenario `scenario'")
  7.   twoway function y = (`pi'*`h1'*`S1' +(1-`pi')*`h2'*`S2') / ///
>                       (`pi'*`S1' + (1-`pi')*`S2') ///
>     , range(0 `maxt') name(h`scenario',replace) ///
>     xtitle("Time (years)") ///
>     ytitle("h(t)") ///
>     ylabel(,angle(h) format(%3.1f)) ///
>         title("Scenario `scenario'")
  8. end

. 
. forvalues i = 1/4 {
  2.         weibmixplot ,  `scenario`i'' scenario(`i')
  3. }

. graph combine s1 s2 s3 s4, nocopies name(true_s, replace) title("Survival functions")

. graph combine h1 h2 h3 h4, nocopies name(true_h, replace) title("Hazard functions")

The true survival function for each scenario is shown below.

and here are the true hazard functions.

For more details on the choice of these functions see Rutherford et al. 2015.

Simulation program

In order to peform a simulation study I will write a program that does three jobs. It will (i) simulate the data, (ii) analyse the data (perhaps using different methods/models) and (iii) store the results. Once I have written the program I can use Stata’s simulate command to run my program many times (e.g. 1000). In my program I will fit models with 4, 5 and 6 df (5, 6 and 7 knots) and use stpm2’s default knot positions and the knot positions given by Harrell. I will then store the AIC and BIC so that these can then be compared. The full program is shown below and I will then explain some of the lines of code.

clear all
program define enzosim, rclass
  syntax [, OBS(integer 1000) lambda1(real 1) lambda2(real 1) ///
      gamma1(real 1) gamma2(real 1) pi(real 0.5) maxt(real 5)]
  clear
  set obs `obs'
  survsim t d, mixture lambda(`lambda1' `lambda2') gamma(`gamma1' `gamma2') ///
    pmix(`pi') maxt(`maxt')
  replace t = ceil(t*365.24)/365.24
  stset t, f(d==1)
  local harrell4 27.5 50 72.5
  local harrell4b 5 95
  local harrell5 23 41 59 77
  local harrell5b 5 95
  local harrell6 18.33 34.17 50 65.83 81.67
  local harrell6b 2.5 97.5
  foreach i in 4 5 6  {
    stpm2, df(`i') scale(hazard)
    return scalar AIC1_df`i' = e(AIC)
    return scalar BIC1_df`i' = e(BIC)
    stpm2, knots(`harrell`i'') knscale(centile) scale(hazard) bknots(`harrell`i'b')
    return scalar AIC2_df`i' = e(AIC)
    return scalar BIC2_df`i' = e(BIC)
  }
  ereturn clear
end

I first drop the program as I need to create a new version whilst I am editing it (fixing bugs etc). I name the program enzosim and make it an rclass program as I want it to return some results. I use the syntax command to allow my program to take options. The options include the number of observations in each simulated data set, the parameters of the mixture Weibull distribution and length of follow-up. Each of these is given a default value.

The next five lines are as follows,

clear
set obs `obs'
survsim t d, mixture lambda(`lambda1' `lambda2') gamma(`gamma1' `gamma2') ///
  pmix(`pi') maxt(`maxt')
replace t = ceil(t*365.24)/365.24
stset t, f(d==1)

I first clear any data in memory and set the observations to whatever was specified in the obs() option (or use the default of 1000 if not specified. I then use the survsim command to simulate from the mixture Weibull model (Crowther and Lambert 2012). The will create two new variables t (the survival time) and d the event indicator. The maxt() option means that any simulated time after 5 years will be censored at 5 years. Note that survsim uses the parameters I pass to my program for the mixture Weibull distribution. After generating data in years, I transform to days and round up to the nearest integer and then transform back to years. The reason for this is that some very small survival times can lead to numerical problems. It also better reflects real data, where survival is often measured to the nearest day. I then stset the data so I can now fit some models.

I then declare some local macros to define the knots positions given by Harrell,

local harrell4 27.5 50 72.5
local harrell4b 5 95
local harrell5 23 41 59 77
local harrell5b 5 95
local harrell6 18.33 34.17 50 65.83 81.67
local harrell6b 2.5 97.5

I have to give the internal knots and the boundary knots separately.

I then write a small loop that loops over different degrees of freedom (4, 5 and 6).

foreach i in 4 5 6  {
  stpm2, df(`i') scale(hazard)
  return scalar AIC1_df`i' = e(AIC)
  return scalar BIC1_df`i' = e(BIC)
  stpm2, knots(`harrell`i'') knscale(centile) scale(hazard) bknots(`harrell`i'b')
  return scalar AIC2_df`i' = e(AIC)
  return scalar BIC2_df`i' = e(BIC)
 }

For each df an stpm2 model is fitted using the default knot placement and then using knot positions recommended by Harrell. Note the use of the knots() option for the internal knots, the bknots() option for the boundary knots and the knscale(centile) option so I can specify the knots as centiles rather than specific point in time (the default). After fitting each model I use return to store both the AIC and BIC.

The final line of code,

ereturn clear

is just a bit of laziness on my part as if you do not specify anything to monitor when using the simulate command it will monitor the coefficients of the model in memory. If no model is stored in memory then it will monitor anything stored in r(), which is what I want. Therefore, I use ereturn clear to remove the last model from memory and now I do not have to give a long list of the things I want to monitor.

Testing the simulation program

When I am developing a simulation program I will run it once. This allows me to check any variables that have been created, spot any potential bugs, make sure any analysis I am performing is correct and make sure the results I want to store are actually stored. If I just type enzosim then it will run my program using the default values specified in the syntax statement of the program. This give the following results,

. enzosim,
number of observations (_N) was 0, now 1,000
Warning: 8 survival times were above the upper limit of 5
         They have been set to 5 and can be considered censored
         You can identify them by _survsim_rc = 3

     failure event:  d == 1
obs. time interval:  (0, t]
 exit on or before:  failure

------------------------------------------------------------------------------
      1,000  total observations
          0  exclusions
------------------------------------------------------------------------------
      1,000  observations remaining, representing
        992  failures in single-record/single-failure data
  1,006.047  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =         5

Iteration 0:   log likelihood = -1615.2795  
Iteration 1:   log likelihood = -1615.0025  
Iteration 2:   log likelihood = -1615.0024  

Log likelihood = -1615.0024                     Number of obs     =      1,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb           |
       _rcs1 |   1.275779   .0422624    30.19   0.000     1.192946    1.358612
       _rcs2 |  -.0492945   .0335147    -1.47   0.141    -.1149821    .0163931
       _rcs3 |   .0072627    .019518     0.37   0.710    -.0309919    .0455174
       _rcs4 |   .0009645   .0117572     0.08   0.935    -.0220792    .0240082
       _cons |  -.5789954   .0405764   -14.27   0.000    -.6585236   -.4994671
------------------------------------------------------------------------------

..... remaining output has been omitted.

The program runs without error and fit the models I intend. I can check that everything I want stored is actually stored using return list.

. return list

scalars:
           r(BIC2_df6) =  3275.609609295175
           r(AIC2_df6) =  3241.325674693275
           r(BIC1_df6) =  3275.271719530712
           r(AIC1_df6) =  3240.987784928811
           r(BIC2_df5) =  3273.020999377406
           r(AIC2_df5) =  3243.634769718634
           r(BIC1_df5) =  3273.049041974547
           r(AIC1_df5) =  3243.662812315775
           r(BIC2_df4) =  3267.19499496639
           r(AIC2_df4) =  3242.706470250746
           r(BIC1_df4) =  3266.905797538291
           r(AIC1_df4) =  3242.417272822648

I can see that all the AIC and BIC values have been returned.

Running the simulations

Now I am ready to simulate 1000 data sets for each scenario using the simulate command. I can loop over the 4 scenarios making use of the local macros already declared for each scenario.

set seed 78126378
forvalues i = 1/4 {
  simulate , reps(1000) saving(sim_scenaro`i', replace double): enzosim, `scenario`i''
}

I pass the relevent local macro for the options for each scenario. The results are saved using the saving option. Each of the created data sets will contain 1000 observations, one for each simulated data set. I then go and make a cup of coffee while I wait for the results…

Summarising the simulations

Once the simulations have run I can start looking at the results. I will first plot the data comparing the AIC between the default knot placement with Harrell’s knot placement for each of the 4, 5 ad 6 df models.

. forvalues s =1/4 {
  2.   quietly {
  3.     use sim_scenaro`s', replace
  4.     forvalues df = 4/6 {
  5.           gen AICdiff_df`df' = AIC2_df`df' - AIC1_df`df'
  6.           hist AICdiff_df`df', name(AIC`df', replace) ylabel(none) ///
>                 ytitle("") xline(0) ///
>                 xtitle("Difference in AIC") ///
>                 title("`df' df", ring(0) pos(1) size(*0.8))
  7. 
.     }
  8.   }
  9.   graph combine AIC4 AIC5 AIC6, cols(3) nocopies name(scenario`s', replace) ///
>     ycommon xcommon title("Scenario `s'", size(*0.8))
 10. }

. graph combine scenario1 scenario2 scenario3 scenario4, nocopies cols(1) imargin(0 0 0 0)

This code calculates the difference in the AIC between Harrell’s knot locations and stpm2’s default knot locations. A positive value indicates a lower AIC for the default knot locations. Note there is no point calculating the difference in the BIC as well as this is identical to the difference in the AIC as the number of parameters is the same in the models we are comparing. The resulting plot can be seen below.

This plot shows that for all scenarios there tends to be a lower AIC for the default knot locations. This is particularly so for scenarios 2 and 3. The change in the AIC is much larger for these two scenarios.

I will next calculate the percentage of time the AIC is lower for the default knot locations.

. forvalues s =1/4 {
  2.   quietly use sim_scenaro`s', replace
  3.   display _newline "Scenario `s'"
  4.   display "------------"
  5.   forvalues df = 4/6 {
  6.     quietly count if AIC2_df`df' > AIC1_df`df'
  7.     di "Default knot locations had lower AIC for `df' df:" %4.1f 100*`r(N)'/_N "%"
  8.   }
  9. }

Scenario 1
------------
Default knot locations had lower AIC for 4 df:72.8%
Default knot locations had lower AIC for 5 df:73.4%
Default knot locations had lower AIC for 6 df:74.2%

Scenario 2
------------
Default knot locations had lower AIC for 4 df:99.0%
Default knot locations had lower AIC for 5 df:97.5%
Default knot locations had lower AIC for 6 df:85.4%

Scenario 3
------------
Default knot locations had lower AIC for 4 df:98.9%
Default knot locations had lower AIC for 5 df:98.8%
Default knot locations had lower AIC for 6 df:90.7%

Scenario 4
------------
Default knot locations had lower AIC for 4 df:68.3%
Default knot locations had lower AIC for 5 df:57.5%
Default knot locations had lower AIC for 6 df:53.1%

Again we can see the dominance of the default knot locations, particularly for scenarios 2 and 3.

Another question is to see which of the models fitted to each simulated data set gives the lowest AIC and whether this differs between the default knot locations and Harrell’s knot locations. I create some code to find the df with the lowest AIC and BIC.

. forvalues s =1/4 {
  2.   quietly use sim_scenaro`s', replace
  3.   egen double minAIC1 = rowmin(AIC1_df?)
  4.   egen double minAIC2 = rowmin(AIC2_df?)
  5.   gen AICmin1 = 4*(minAIC1==AIC1_df4) + 5*(minAIC1==AIC1_df5)+6*(minAIC1==AIC1_df6)
  6.   gen AICmin2 = 4*(minAIC2==AIC2_df4) + 5*(minAIC2==AIC2_df5)+6*(minAIC2==AIC2_df6)
  7.   egen double minBIC1 = rowmin(BIC1_df?)
  8.   egen double minBIC2 = rowmin(BIC2_df?)
  9.   gen BICmin1 = 4*(minBIC1==BIC1_df4) + 5*(minBIC1==BIC1_df5)+6*(minBIC1==BIC1_df6)
 10.   gen BICmin2 = 4*(minBIC2==BIC2_df4) + 5*(minBIC2==BIC2_df5)+6*(minBIC2==BIC2_df6)
 11.   di _newline "Scenario `s'"
 12.   di "AIC"
 13.   tab AICmin1 AICmin2
 14.   di "BIC"
 15.   tab BICmin1 BICmin2
 16. }

Scenario 1
AIC

           |             AICmin2
   AICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |       694         26          8 |       728 
         5 |        23        106         25 |       154 
         6 |        28          0         90 |       118 
-----------+---------------------------------+----------
     Total |       745        132        123 |     1,000 

BIC

           |        BICmin2
   BICmin1 |         4          5 |     Total
-----------+----------------------+----------
         4 |       975          6 |       981 
         5 |         7         10 |        17 
         6 |         2          0 |         2 
-----------+----------------------+----------
     Total |       984         16 |     1,000 


Scenario 2
AIC

           |             AICmin2
   AICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |        59         20        379 |       458 
         5 |         1         19        240 |       260 
         6 |         2          0        280 |       282 
-----------+---------------------------------+----------
     Total |        62         39        899 |     1,000 

BIC

           |             BICmin2
   BICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |       602         53        282 |       937 
         5 |         1          5         40 |        46 
         6 |         0          0         17 |        17 
-----------+---------------------------------+----------
     Total |       603         58        339 |     1,000 


Scenario 3
AIC

           |             AICmin2
   AICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |         8         10         52 |        70 
         5 |         0         12        162 |       174 
         6 |         2          0        754 |       756 
-----------+---------------------------------+----------
     Total |        10         22        968 |     1,000 

BIC

           |             BICmin2
   BICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |       357         22        318 |       697 
         5 |         4          8        119 |       131 
         6 |         0          0        172 |       172 
-----------+---------------------------------+----------
     Total |       361         30        609 |     1,000 


Scenario 4
AIC

           |             AICmin2
   AICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |       528         83         14 |       625 
         5 |        31        131         66 |       228 
         6 |        20          4        123 |       147 
-----------+---------------------------------+----------
     Total |       579        218        203 |     1,000 

BIC

           |             BICmin2
   BICmin1 |         4          5          6 |     Total
-----------+---------------------------------+----------
         4 |       935         24          5 |       964 
         5 |         9         19          3 |        31 
         6 |         1          0          4 |         5 
-----------+---------------------------------+----------
     Total |       945         43         12 |     1,000

What I find interesting is that there is a tendency for AIC to select fewer knots for the default knot locations. As above, this is especially so for scenarios 2 and 3. This is not the case for the more simple scenario 1. Here the truth is a Weibull distribution and so all models are overfitting when compared with the truth.

I don’t think the differences we see here are that great and of course we are only looking at a few scenarios. However, it is reassuring to me that our default knot locations seem sensible. A more detailed analysis would compare hazard and survival functions with the true function. When we use splines, I don’t really think that they represent the true model, but they should give a very good approximation to it. This is of crucial importance as with real data, we never know the true model.

References

Andersson, T.M.-L., Dickman, P.W., Eloranta, S., Lambert, P.C. Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Med Res Methodol 2011;11:96

Crowther, M.J., Lambert, P.C. Simulating complex survival data. The Stata Journal 2012;12:674-687.

Harrell, F.E. Regression modeling strategies with application to linear models, logistic regression and survival analysis. Springer, 2001

Rutherford, M.J., Crowther, M.J., Lambert, P.C. The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. Journal of Statistical Computation and Simulation 2015;85:777-793