Tuesday, April 2, 2013

Estimating continuous piecewise linear regression

When talking about smoothing splines a simple point to start with is a continuous piecewise linear regression with fixed knots. I did not find any simple example showing how to estimate the it in GNU R so I have created a little snippet that does the job.

Assume you are given continuous predictor x and continuous predicted variable y. We want to estimate continuous piecewise linear regression with fixed knots stored in variable knots using standard lm procedure.

The key to a solution is proper definition of regression formula. In order to introduce possibility of change of slope in knot k we have to add a so called hinge term to the model max(0, x-k).

In the code given below function piece.formula automatically generates a proper right hand side of the regression formula given variable name and list of required knots. It is next tested on a simple function.
N <- 40 # number of sampled points
K <- 5  # number of knots

piece.formula <- function(var.name, knots) {
  formula.sign <- rep(" - ", length(knots))
  formula.sign[knots < 0] <- " + "
  paste(var.name, "+",
        paste("I(pmax(", var.name, formula.sign, abs(knots), ", 0))",
              collapse = " + ", sep=""))
}

f <- function(x) {
    2 * sin(6 * x)
}

set.seed(1)
x <- seq(-1, 1, len = N)
y <- f(x) + rnorm(length(x))

knots <- seq(min(x), max(x), len = K + 2)[-c(1, K + 2)]
model <- lm(formula(paste("y ~", piece.formula("x", knots))))

par(mar = c(4, 4, 1, 1))
plot(x, y)
lines(x, f(x))
new.x <- seq(min(x), max(x) ,len = 10000)
points(new.x, predict(model, newdata = data.frame(x = new.x)),
      col = "red", pch = ".")
points(knots, predict(model, newdata = data.frame(x = knots)),
       col = "red", pch = 18)
Below we can see the graph of estimation result. Red line is the desired continuous piecewise linear regression with fixed knots given by red diamonds. Notice that the plot uses points procedure to plot the red line to highlight that the generated predictions have the required properties.


An additional value of the presented solution that we do not do any preprocessing of predictor variable if we want to make a prediction - all the calculations are made within the formula.

Of course this simple example can be easily extended to obtain a simple smoother. For example we can set K to be large and use some regularized regression like ridge or lasso.

No comments:

Post a Comment