set.seed(4321)
<- runif(200, min = 0, max = 4*pi)
x <- sort(x)
x <- sin(x) + rnorm(200, sd = 0.3)
y plot(x, y, pch = 20)
29 Local regression
Estimating the effects of a covariate \(x\) on a response \(y\) non-parametrically, letting the data suggest the appropriate functional form.
29.1 Non-linear relationship between covariate and response
Consider a linear model with one predictor
\[y = f(x) + \epsilon\]
We want to estimate \(f\), the trend or smooth. Assume data are ordered so \(x_1 < x_2 < \ldots < x_n\).
<- lm(y ~ x)
fit plot(x, y, pch = 20)
abline(fit, lwd = 4, col = "red")
The linear regression (red line) does not capture the underlying structure. This is due to the fact that linear regression is only capable of detecting the linear relationship between the two variables. In this dataset, though, we can observe a clear relationship between \(x\) and \(y\) but they have a non-linear relationship, resulting in a bad outcome from the linear regression.
29.2 Running mean or moving average
We estimate the smooth at \(x_i\) by averaging the \(y\)’s corresponding to \(x\)’s in a neighborhood of \(x_i\):
\[S(x_i) = \sum_{j \in N(x_i)} y_j/n_i\]
for a neighborhood \(N(x_i)\) with \(n_i\) observations. A common choice is to take a symmetric neighborhood consisting of the nearest \(2k+1\) points:
\[N(x_i)=\left( max(i-k,1), \ldots, i-1, i, i+1, min(i+k,n) \right)\]
library(zoo)
# k: integer width of the rolling window
# fill: vector with filling values at the left/within/to the right of the data range
plot(x, y, pch = 20)
lines(x, rollmean(y, k = 3, fill = c(NA, NULL, NA)), lwd = 4, col = "purple")
Curve is wiggly.
29.3 Kernel smoothers
An alternative approach is to use a weighted running mean, with weights that decline as one moves away from the target value
\[S(x_i) = \sum_{j} w_{ij} y_j\]
To calculate \(S(x_i)\), the j-th point receives weight that can be calculated with a Kernel function.
\[w_{ij} = \frac{c_i}{\lambda}K\left( \frac{|x_i-x_j|}{\lambda} \right)\] where \(K(\cdot)\) is a symmetric function about the \(y\) axis \(K(-x) = K(x)\), \(\lambda\) is a tuning constant called the window width or bandwidth, and \(c_i\) is a normalizing constant so the weights add up to one for each \(x_i\).
The bandwidth \(\lambda\) determines the amount of smoothing applied in estimating \(f(x)\).
A kernel is a non-negative real-valued integrable function \(K\). For most applications, it is desirable to define the function to satisfy two additional requirements:
- Normalization: \(\int_{-\infty}^{+\infty}K(u)du=1\)
- Symmetry: \(K(-u)=K(u)\ \forall u\)
Popular choices of function \(K(\cdot)\) are:
- Gaussian: \(K(t)=\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2} t^2}\)
- Epanechikov: \(K(t) = 3/4 (1-t^2)\), \(|t|<1\), 0 otherwise.
When we set kernel = "normal"
, we are using the Gaussian function as the kernel. The argument bandwidth
controls the amount of smoothing.
<- ksmooth(x, y, kernel = "normal", bandwidth = 0.9)
kreg plot(x, y, pch = 20)
lines(kreg, lwd = 4, col = "purple")
Here are examples of changing the value of bandwidth:
<- ksmooth(x, y, kernel = "normal", bandwidth = 0.1)
kreg1 <- ksmooth(x, y, kernel = "normal", bandwidth = 0.9)
kreg2 <- ksmooth(x, y, kernel = "normal", bandwidth = 3.0)
kreg3 plot(x, y, pch = 20)
lines(kreg1, lwd = 4, col = "orange")
lines(kreg2, lwd = 4, col = "purple")
lines(kreg3, lwd = 4, col = "limegreen")
legend("topright", c("h = 0.1", "h = 0.9", "h = 3.0"), lwd = 6, col = c("orange", "purple", "limegreen"))
29.4 Choosing bandwidth in kernel regression using cross-validation
Fit kernel regression and choose the bandwidth using cross-validation.
set.seed(4321)
<- runif(200, min = 0, max = 4*pi)
x <- sin(x) + rnorm(200, sd = 0.3)
y plot(x, y, pch = 20)
We can choose the bandwidth using leave-one-out cross-validation (LOOCV).
For bandwidth = 0.9. Prediction error is mean(CV_err)
<- function(x, y, h){
fnloocv # n sample size
<- length(x)
n = rep(NA, n)
CV_err for(i in 1:n){
# validation set
<- x[i]
x_val <- y[i]
y_val # training set
<- x[-i]
x_tr <- y[-i]
y_tr # we measure the error in terms of difference square
<- ksmooth(x = x_tr, y = y_tr, kernel = "normal", bandwidth = h, x.points = x_val)
y_val_predict <- (y_val - y_val_predict$y)^2
CV_err[i]
}return(mean(CV_err))
}
fnloocv(x, y, 0.9)
[1] 0.1027413
Calculate prediction error for different values of \(h\) using LOOCV.
<- seq(from = 0.2, to = 2.0, by = 0.1)
h_seq # smoothing bandwidths we are using
<- rep(NA, length(h_seq))
CV_err_h for(j in 1:length(h_seq)){
<- fnloocv(x, y, h_seq[j])
CV_err_h[j]
}
plot(h_seq, CV_err_h, type = "b", lwd = 3,
xlab = "Smoothing bandwidth", ylab = "LOOCV prediction error")
Choose the smoothing bandwidth that minimizes this LOOCV error
which.min(CV_err_h)] h_seq[
[1] 0.9
29.5 Exercise
- Simulate 500 pairs of \((x, y)\) such that
\[Y_i = exp(-0.1 X_i) cos(X_i) + \epsilon_i,\ \ X_i \sim \mbox{Unif}[0, 6\pi],\ \ \epsilon \sim N(0, 0.1^2)\]
Show a scatter plot of \(X, Y\) and the fit of a kernel regression with smoothing bandwidth \(h=0.5\) and Gaussian kernel (
kernel = "normal"
).Consider three smoothing bandwidth \(h = 0.1, 0.5, 2.0\). Show the fit of the kernel regression to each of them. Which smoothing bandwidth seems to be the best one?
Use leave-one out cross-validation to show the smoothing bandwidth versus CV-error plot for the kernel regression. Which smoothing bandwidth provides you the minimal CV-error? (You could search in the range of \(h\) [0.1, 2.0])
set.seed(4321)
<- sort(runif(500, 0, 6*pi))
x <- exp(-.1 * x) * cos(x) + rnorm(length(x), 0, sd = 0.1)
y plot(x, y)
<- ksmooth(x, y, kernel = "normal", bandwidth = 0.1)
kreg1 <- ksmooth(x, y, kernel = "normal", bandwidth = 0.5)
kreg2 <- ksmooth(x, y, kernel = "normal", bandwidth = 2.0)
kreg3 plot(x, y, pch = 20)
lines(kreg1, lwd = 4, col = "orange")
lines(kreg2, lwd = 4, col = "purple")
lines(kreg3, lwd = 4, col = "limegreen")
legend("topright", c("h = 0.1", "h = 0.5", "h = 2.0"), lwd = 6, col = c("orange", "purple", "limegreen"))