Derivatives and integrals of spline functions.
One advantage of using spline functions is that we get analytical expressions for the derivative and integral of the spline function.
I will illustrate this using B-splines. I first generate some data and plot it.
clear
.
set obs 1000
. of observations (_N) was 0, now 1,000.
Number
range x 0 5
.
gen y = 2 + sin(x) + rnormal(0,0.1)
.
scatter y x, color(%30) ylabel(0(0.5)3.5, format(%3.1f)) .
We use B-splines to fit a non-linear function to this data and then obtain the derivative and integral of this function.
If our spline function estimate \(f(x)\), we can generate the derivative, \(f'(x)\) by obtaining the derivative of each spline variable w.r.t. \(x\).
In gensplines
this is done by including the dgen() option.
type(bs) df(4) gen(_bs) dgen(_dbs)
. gensplines x,
global knots `r(knots)'
.
regress y _bs*
.
of obs = 1,000
Source | SS df MS Number F(4, 995) = 12637.11
-------------+---------------------------------- F = 0.0000
Model | 501.188667 4 125.297167 Prob >
Residual | 9.86543954 995 .009915015 R-squared = 0.9807
-------------+---------------------------------- Adj R-squared = 0.9806
Total | 511.054106 999 .511565672 Root MSE = .09957
------------------------------------------------------------------------------y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_bs1 | 1.179382 .0307769 38.32 0.000 1.118987 1.239777
_bs2 | 1.422978 .0239687 59.37 0.000 1.375943 1.470013
_bs3 | -1.309876 .0273988 -47.81 0.000 -1.363642 -1.25611
_bs4 | -.8256583 .0198255 -41.65 0.000 -.864563 -.7867537_cons | 1.92711 .0150208 128.30 0.000 1.897634 1.956586
------------------------------------------------------------------------------
predict mu
. xb assumed; fitted values) (option
Below I add the fitted regression line to the scatter plot
twoway (scatter y x, color(%30)) ///
. line mu x, sort), ///
> (ylabel(0(0.5)3.5, format(%3.1f)) ///
> legend(off) >
To generate the dervative of the fitted function we can multipy the coefficients by the variables created by the dgen() option, the derivative of each spline variable w.r.t. \(x\)..
gen double deriv1 = _b[_bs1]*_dbs1 + _b[_bs2]*_dbs2 + _b[_bs3]*_dbs3 + _b[_bs4]*_dbs4 .
As a comparison I calculate the derivatives numerically using the dydx
command.
dydx mu x, gen(deriv2) double
.
compare deriv1 deriv2
.
---------- Difference ----------
Count Minimum Average Maximum
------------------------------------------------------------------------
deriv1<deriv2 514 -2.23e-06 -6.57e-09 -1.11e-16
deriv1=deriv2 1
deriv1>deriv2 485 1.39e-17 2.21e-08 8.32e-06
----------
Jointly defined 1000 -2.23e-06 7.35e-09 8.32e-06
---------- Total 1000
We can plot the function and can see that the derivative is zero at the turning points of the original function.
line deriv2 x .
In order to calculate the integral of the function we need to generate the integral of each of the spline variables. This can be done by using the type(ibs)
option of gensplines
. If we use df(4)
as above, then the knots will be in exactly the same location. Alternatively, we could have saved the knots when we generated the original spline variables and used the allknots()
option.
type(ibs) df(4) gen(_ibs) . gensplines x,
We need to integrate the intercept and then incude the integrated spline functions and multiply each by the associated coefficient.
gen double integ1 = _b[_cons]*x + _b[_bs1]*_ibs1 + _b[_bs2]*_ibs2 + _b[_bs3]*_ibs3 + _b[_bs
. > 4]*_ibs4
We can compare to the numerically integrated function.
integ mu x, gen(integ2) double
.
of points = 1000
number
integral = 10.73512
compare integ1 integ2
.
---------- Difference ----------
Count Minimum Average Maximum
------------------------------------------------------------------------
integ1<integ2 1 -1.38e-11 -1.38e-11 -1.38e-11
integ1=integ2 1
integ1>integ2 998 1.05e-12 2.82e-12 8.18e-12
----------
Jointly defined 1000 -1.38e-11 2.80e-12 8.18e-12
---------- Total 1000
And plot the integral at each value of x
.
line integ1 x .
If we are just interested in the integral at one value of x, we just pass this value to gensplines
. For example, the integral at x=5
is,
type(ibs) allknots(${knots}) gen(_k)
. gensplines 5,
lincom _b[_cons]*5 + _b[_bs1]*_k1 + _b[_bs2]*_k2 + _b[_bs3]*_k3 + _b[_bs4]*_k4
.
_cons = 0
( 1) 1.25*_bs1 + 1.25*_bs2 + 1.25*_bs3 + .625*_bs4 + 5*
------------------------------------------------------------------------------y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | 10.73512 .0157442 681.85 0.000 10.70422 10.76602 ------------------------------------------------------------------------------
Note we also get a confidence interval for the integral as I used lincom
.
Adding the integral of the intercept can be avoided by generating an extra spline variable and then using the nonconstant
option when fitting the model. Below I do this and then obtain the integral in a frame. By naming the integrated B-spline variables the same as the spline variables, I can use the predict
command to obtain the integrated function.
drop _*
.
type(bs) df(4) gen(_bs) intercept
. gensplines x,
regress y _bs*, nocons
.
of obs = 1,000
Source | SS df MS Number F(5, 995) > 99999.00
-------------+---------------------------------- F = 0.0000
Model | 5108.18252 5 1021.6365 Prob >
Residual | 9.86543954 995 .009915015 R-squared = 0.9981
-------------+---------------------------------- Adj R-squared = 0.9981
Total | 5118.04796 1,000 5.11804796 Root MSE = .09957
------------------------------------------------------------------------------y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_bs1 | 1.92711 .0150208 128.30 0.000 1.897634 1.956586
_bs2 | 3.106492 .0191994 161.80 0.000 3.068816 3.144168
_bs3 | 3.350088 .0261145 128.28 0.000 3.298842 3.401334
_bs4 | .6172346 .0191994 32.15 0.000 .5795586 .6549105
_bs5 | 1.101452 .0150208 73.33 0.000 1.071976 1.130928
------------------------------------------------------------------------------
. drop integral
. cap frame
y x, into(integral)
. frame put
. frame integral {type(ibs) df(4) gen(_bs) intercept
. gensplines x, predict mu,
. xb assumed; fitted values)
(option line mu x, ytitle(Integral of fitted values)
. . }