Derivatives and integrals of spline functions.

One advantage of using spline functions is that we get analytical expressions for the derivative and integral of the spline function.

I will illustrate this using B-splines. I first generate some data and plot it.

. clear

. set obs 1000
Number of observations (_N) was 0, now 1,000.

. range x 0 5

. gen y = 2 + sin(x) + rnormal(0,0.1)

. scatter y x, color(%30) ylabel(0(0.5)3.5, format(%3.1f))

We use B-splines to fit a non-linear function to this data and then obtain the derivative and integral of this function.

If our spline function estimate $f (x)$ , we can generate the derivative, $f^{'} (x)$ by obtaining the derivative of each spline variable w.r.t. $x$ .

In gensplines this is done by including the dgen() option.

. gensplines x, type(bs) df(4) gen(_bs) dgen(_dbs)

. global knots `r(knots)'

. regress y _bs*

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(4, 995)       =  12637.11
       Model |  501.188667         4  125.297167   Prob > F        =    0.0000
    Residual |  9.86543954       995  .009915015   R-squared       =    0.9807
-------------+----------------------------------   Adj R-squared   =    0.9806
       Total |  511.054106       999  .511565672   Root MSE        =    .09957

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        _bs1 |   1.179382   .0307769    38.32   0.000     1.118987    1.239777
        _bs2 |   1.422978   .0239687    59.37   0.000     1.375943    1.470013
        _bs3 |  -1.309876   .0273988   -47.81   0.000    -1.363642    -1.25611
        _bs4 |  -.8256583   .0198255   -41.65   0.000     -.864563   -.7867537
       _cons |    1.92711   .0150208   128.30   0.000     1.897634    1.956586
------------------------------------------------------------------------------

. predict mu
(option xb assumed; fitted values)

Below I add the fitted regression line to the scatter plot

. twoway (scatter y x, color(%30))         ///
>        (line mu x, sort),                ///
>        ylabel(0(0.5)3.5, format(%3.1f))  ///
>        legend(off)

To generate the dervative of the fitted function we can multipy the coefficients by the variables created by the dgen() option, the derivative of each spline variable w.r.t. $x$ ..

. gen double deriv1 = _b[_bs1]*_dbs1 + _b[_bs2]*_dbs2 + _b[_bs3]*_dbs3 + _b[_bs4]*_dbs4

As a comparison I calculate the derivatives numerically using the dydx command.

. dydx mu x, gen(deriv2) double

. compare deriv1 deriv2

                                        ---------- Difference ----------
                            Count       Minimum      Average     Maximum
------------------------------------------------------------------------
deriv1<deriv2                 514     -2.23e-06    -6.57e-09   -1.11e-16
deriv1=deriv2                   1
deriv1>deriv2                 485      1.39e-17     2.21e-08    8.32e-06
                       ----------
Jointly defined              1000     -2.23e-06     7.35e-09    8.32e-06
                       ----------
Total                        1000

We can plot the function and can see that the derivative is zero at the turning points of the original function.

. line deriv2 x

In order to calculate the integral of the function we need to generate the integral of each of the spline variables. This can be done by using the type(ibs) option of gensplines. If we use df(4) as above, then the knots will be in exactly the same location. Alternatively, we could have saved the knots when we generated the original spline variables and used the allknots() option.

. gensplines x, type(ibs) df(4) gen(_ibs)

We need to integrate the intercept and then incude the integrated spline functions and multiply each by the associated coefficient.

. gen double integ1 = _b[_cons]*x + _b[_bs1]*_ibs1 + _b[_bs2]*_ibs2 + _b[_bs3]*_ibs3 + _b[_bs
> 4]*_ibs4

We can compare to the numerically integrated function.

. integ mu x, gen(integ2) double

number of points = 1000

integral         = 10.73512

. compare integ1 integ2

                                        ---------- Difference ----------
                            Count       Minimum      Average     Maximum
------------------------------------------------------------------------
integ1<integ2                   1     -1.38e-11    -1.38e-11   -1.38e-11
integ1=integ2                   1
integ1>integ2                 998      1.05e-12     2.82e-12    8.18e-12
                       ----------
Jointly defined              1000     -1.38e-11     2.80e-12    8.18e-12
                       ----------
Total                        1000

And plot the integral at each value of x.

. line integ1 x

If we are just interested in the integral at one value of x, we just pass this value to gensplines. For example, the integral at x=5 is,

. gensplines 5, type(ibs) allknots(${knots}) gen(_k) 

. lincom _b[_cons]*5 + _b[_bs1]*_k1 + _b[_bs2]*_k2 + _b[_bs3]*_k3 + _b[_bs4]*_k4

 ( 1)  1.25*_bs1 + 1.25*_bs2 + 1.25*_bs3 + .625*_bs4 + 5*_cons = 0

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |   10.73512   .0157442   681.85   0.000     10.70422    10.76602
------------------------------------------------------------------------------

Note we also get a confidence interval for the integral as I used lincom.

Adding the integral of the intercept can be avoided by generating an extra spline variable and then using the nonconstant option when fitting the model. Below I do this and then obtain the integral in a frame. By naming the integrated B-spline variables the same as the spline variables, I can use the predict command to obtain the integrated function.

. drop _*

. gensplines x, type(bs) df(4) gen(_bs) intercept

. regress y _bs*, nocons

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(5, 995)       >  99999.00
       Model |  5108.18252         5   1021.6365   Prob > F        =    0.0000
    Residual |  9.86543954       995  .009915015   R-squared       =    0.9981
-------------+----------------------------------   Adj R-squared   =    0.9981
       Total |  5118.04796     1,000  5.11804796   Root MSE        =    .09957

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        _bs1 |    1.92711   .0150208   128.30   0.000     1.897634    1.956586
        _bs2 |   3.106492   .0191994   161.80   0.000     3.068816    3.144168
        _bs3 |   3.350088   .0261145   128.28   0.000     3.298842    3.401334
        _bs4 |   .6172346   .0191994    32.15   0.000     .5795586    .6549105
        _bs5 |   1.101452   .0150208    73.33   0.000     1.071976    1.130928
------------------------------------------------------------------------------

.  
. cap frame drop integral

. frame put y x, into(integral)

. frame integral {
.   gensplines x, type(ibs) df(4) gen(_bs) intercept
.   predict mu,
(option xb assumed; fitted values)
.   line mu x, ytitle(Integral of fitted values)
. }