Assume you are given continuous predictor x and continuous predicted variable y. We want to estimate continuous piecewise linear regression with fixed knots stored in variable knots using standard lm procedure.
The key to a solution is proper definition of regression formula. In order to introduce possibility of change of slope in knot k we have to add a so called hinge term to the model max(0, x-k).
In the code given below function piece.formula automatically generates a proper right hand side of the regression formula given variable name and list of required knots. It is next tested on a simple function.
N <- 40 # number of sampled points K <- 5 # number of knots piece.formula <- function(var.name, knots) { formula.sign <- rep(" - ", length(knots)) formula.sign[knots < 0] <- " + " paste(var.name, "+", paste("I(pmax(", var.name, formula.sign, abs(knots), ", 0))", collapse = " + ", sep="")) } f <- function(x) { 2 * sin(6 * x) } set.seed(1) x <- seq(-1, 1, len = N) y <- f(x) + rnorm(length(x)) knots <- seq(min(x), max(x), len = K + 2)[-c(1, K + 2)] model <- lm(formula(paste("y ~", piece.formula("x", knots)))) par(mar = c(4, 4, 1, 1)) plot(x, y) lines(x, f(x)) new.x <- seq(min(x), max(x) ,len = 10000) points(new.x, predict(model, newdata = data.frame(x = new.x)), col = "red", pch = ".") points(knots, predict(model, newdata = data.frame(x = knots)), col = "red", pch = 18)Below we can see the graph of estimation result. Red line is the desired continuous piecewise linear regression with fixed knots given by red diamonds. Notice that the plot uses points procedure to plot the red line to highlight that the generated predictions have the required properties.
An additional value of the presented solution that we do not do any preprocessing of predictor variable if we want to make a prediction - all the calculations are made within the formula.
Of course this simple example can be easily extended to obtain a simple smoother. For example we can set K to be large and use some regularized regression like ridge or lasso.
Hi, thank you for the code, i find my knots much easier using it, but i still don't understand how to find the models, if i have 2 knots, then shouldn't i have 3 models? Can you show me how to find the models using your code? Thank you very much
ReplyDeleteI have other approach. There is only one model:
ReplyDeletemodel <- lm(formula(paste("y ~", piece.formula("x", knots))))
but it has hinge variables at knots which you can check when you execute:
paste("y ~", piece.formula("x", knots))
to get the formula:
"y ~ x + I(pmax(x + 0.666666666666667, 0)) + I(pmax(x + 0.333333333333333, 0)) + I(pmax(x - 0, 0)) + I(pmax(x - 0.333333333333333, 0)) + I(pmax(x - 0.666666666666667, 0))"
and the final model is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.559 2.197 2.986 0.00530 **
x 5.865 2.690 2.180 0.03647 *
I(pmax(x + 0.666666666666667, 0)) -22.189 4.520 -4.909 2.41e-05 ***
I(pmax(x + 0.333333333333333, 0)) 27.239 4.179 6.518 2.12e-07 ***
I(pmax(x - 0, 0)) -7.775 4.164 -1.867 0.07080 .
I(pmax(x - 0.333333333333333, 0)) -14.905 4.179 -3.567 0.00113 **
I(pmax(x - 0.666666666666667, 0)) 15.860 4.520 3.509 0.00132 **
Holy blog, THANK YOU so much, have a pleasant day :D
ReplyDelete