14  Local Regression


14.1 Local Averages

Scatterplots are a great and simplest plot for bivariate data that simply plots each observation. There are many extensions and similar tools. The example below helps understand how both the central tendency and dispersion change.

Code
# Local relationship: wages and education
library(wooldridge)

## Plot 1
plot(wage~educ, data=wage1, pch=16, col=grey(0,.1))
educ_means <- aggregate(wage~educ, data=wage1, mean)
points(wage~educ, data=educ_means, pch=17, col='blue', type='b')
title("Grouped Means and Scatterplot", font.main=1)

Code

## Plot 2 (Alternative for big datasets)
# boxplot(wage~educ, data=wage1, 
#    pch=16, col=grey(0,.1), varwidth=T)
#title("Boxplots", font.main=1)

## Plot 3 (Less informative!)
#barplot(wage~educ, data=educ_means)
#title("Bar Plot of Grouped Means", font.main=1)

Examine the local relationship between ‘Murder’ and ‘Urbanization’ in the USArrests dataset. Use custom bins.

Code
xy <- USArrests[,c('Murder','UrbanPop')]
xy[,'bins'] <- cut(USArrests[,'UrbanPop'], 6) 

To move beyond descriptive statistics, we will use models. We previously covered linear models, but you could be analyzing data with nonlinear relationships. So now we consider a general relationship: \[\begin{eqnarray} \hat{Y}_{i} = m(\hat{X}_{i}) + e_{i}, \end{eqnarray}\] where \(m\) is an unknown function and \(e_{i}\) is white noise, which we estimate via different models. After minimizing the sum of squared errors in a the regressions below, our estimated models \(\hat{m}\) will also predict local averages.

14.2 Regressograms

Just as we use histograms to describe the distribution of a random variable, we can use a regressogram for conditional relationships. Specifically, we can use dummies for exclusive intervals or “bins” to estimate the average value of \(\hat{Y}_{i}\) for data inside the bin.

Code
# Data
plot(wage~educ, data=wage1, pch=16, col=grey(0,.1))
dat <- wage1[order(wage1[,'educ']), c('wage','educ')]

## Simple Regression
reg  <- lm(wage~educ, data=dat) ## OLS

# Regressogram: Course Age Bins
dat[,'xcc'] <- cut(dat[,'educ'], 2)
rgram_c  <- lm(wage~xcc, data=dat)

# Regressogram: Fine Age Bins
dat[,'xcf']   <- cut(dat[,'educ'], 3)
rgram_f  <- lm(wage~xcf, data=dat)

# Regressogram (Means for each level)
dat[,'xd']   <- as.factor(dat[,'educ'])
rgram_d  <- lm(wage~xd, data=dat)

## Compare Models (Only 2 for simplicity)
lines( dat[,'educ'], predict(reg), lwd=2, col=2)
lines( dat[,'educ'], predict(rgram_f), lwd=2, col=4, type='s')
legend('topleft',
    legend=c('Linear Regression','Regressogram (3)'),
    col=c(2,4),
    lty=1, cex=.8)

To conduct a regressogram, first divide \(X\) into \(1,...L\) exclusive bins of half-width \(h\). Each bin has a midpoint, \(x\), and each observation has an associated dummy variable \(\hat{D}_{i}(x,h) = \mathbf{1}\left(\hat{X}_{i} \in \left(x-h,x+h\right] \right)\). Then we conduct a regression with this model: \[\begin{eqnarray} \hat{Y}_{i} &=& \sum_{x \in \{x_{1}, ..., x_{L} \}} b_{0}(x,h) \hat{D}_{i}(x,h) + e_{i}, \end{eqnarray}\] where each bin has a coefficient \(b_{0}(x,h)\).

Local Averages.

When minimizing the sum of squared errors, the optimal coefficients are denoted as \(\hat{b}_{0}(x,h)\) and we can find them analytically. To do so, notice that each bin has \(n(x,h) = \sum_{i}^{n}\hat{D}_{i}(x,h)\) observations. This means we can split the dataset into parts associated with each bin \[\begin{eqnarray} \label{eqn:regressogram1} \sum_{i}^{n}\left[e_{i}\right]^2 &=& \sum_{i}^{n}\left[\hat{Y}_{i}- \sum_{x \in \{x_{1}, ..., x_{L} \}} b_{0}(x,h) \hat{D}_{i}(x,h) \right]^2 \\ &=& \sum_{i}^{n(x_{1},h)}\left[\hat{Y}_{i}- \sum_{x \in \{x_{1}, ..., x_{L} \}} b_{0}(x,h) \hat{D}_{i}(x,h) \right]^2 + ... \sum_{i}^{n(x_{L},h)}\left[\hat{Y}_{i}- \sum_{x \in \{x_{1}, ..., x_{L} \}} b_{0}(x,h) \hat{D}_{i}(x,h) \right]^2 \\ &=& \sum_{i}^{n(x_{1},h)}\left[\hat{Y}_{i}- b_{0}\left(x_1,h\right) \right]^2 + ... \sum_{i}^{n(x_L,h)}\left[\hat{Y}_{i}-b_{0}\left(x_L,h\right) \right]^2 % +~ (N-1)\sum_{i}\hat{Y}_{i}. \end{eqnarray}\] This separation allows us to analytically optimize for each bin separately \[\begin{eqnarray} \label{eqn:regressogram2} \min_{ \left\{ b_{0}(x,h) \right\} } \sum_{i}^{n}\left[e_{i}\right]^2 &=& \min_{ \left\{ b_{0}(x,h) \right\} } \sum_{i}^{n(x,h)}\left[\hat{Y}_{i}- b_{0}\left(x,h\right) \right]^2, \end{eqnarray}\] In any case, minimizing yields the optimal coefficient as follows \[\begin{eqnarray} 0 &=& -2 \sum_{i}^{n(x,h)}\left[ \hat{Y}_{i} - b_{0}(x,h) \right] \\ \hat{b}_{0}(x,h) &=& \frac{\sum_{i}^{n(x,h)} \hat{Y}_{i}}{ n(x,h) } = \hat{M}_{Y}(x,h). \end{eqnarray}\] As such, the OLS regression yields coefficients that are interpreted as the conditional mean: \(\hat{M}_{Y}(x,h)\). We can directly compute the same statistic directly by simply taking the average value of \(\hat{Y}_{i}\) for all \(i\) observations in a particular bin.

In any case, the values predicted by the model are found as \[\begin{eqnarray} \hat{y}_{i} = \sum_{x} \hat{b}_{0}(x,h) \hat{D}_{i}(x,h) = \sum_{x} \hat{M}_{Y}(x,h) \hat{D}_{i}(x,h). \end{eqnarray}\] I.e., the predicted value of observation \(i\) is the the average value for its bin.

Consider this two-bin regressogram example of how age affects wage for people with \(\leq 9\) years of school complete vs \(> 9\). \[\begin{eqnarray} \text{Wage}_{i} &=& b_{0}(x=4.5, h=9/2) \mathbf{1}\left(\text{Educ}_{i} \in (0,9]\right) + b_{0}(x=13.5, h=9/2) \mathbf{1}\left(\text{Age}_{i} \in (9,18] \right) + e_{i}. \end{eqnarray}\]

Here is a simple example with three data points \((\hat{X}_{i}, \hat{Y}_{i}) \in \{ (1,2), (4,4), (12,3) \}\), we can easily compute \[\begin{eqnarray} n(x=4.5,h=9/2) &=& 2\\ \hat{b}_{0}(x=4.5, h=9/2) &=& [2 + 4] / 2 \\ n(x=13.5,h=9/2) &=& 1\\ \hat{b}_{0}(x=13.5, h=9/2) &=& 3 / 1 \end{eqnarray}\]

Here is a simple example with data

Code
pred_c <- predict(rgram_c)
pred_dat <- data.frame(xcc=dat$xcc, pred=round(pred_c,6))
table(pred_dat)
##             pred
## xcc          4.107544 6.113475
##   (-0.018,9]       57        0
##   (9,18]            0      469

## Compare to simple aggregation 
aggregate(wage~xcc, data=dat, mean)
##          xcc     wage
## 1 (-0.018,9] 4.107544
## 2     (9,18] 6.113475

14.3 Piecewise Regression

The regressogram depicts locally constant relationships. We can also included slope terms within each bin to allow for locally linear relationships. This is often called segmented/piecewise regression, which runs a separate regression for different subsets of the data.

Code
# Data
plot(wage~educ, data=wage1, pch=16, col=grey(0,.1))
dat <- wage1[order(wage1[,'educ']), c('wage','educ')]

# Piecewise: Course Age Bins
dat[,'xcc'] <- cut(dat[,'educ'], 2)
preg_c  <- lm(wage~xcc*educ, data=dat)

# Piecewise: Fine Age Bins
dat[,'xcf']   <- cut(dat[,'educ'], 3) 
preg_f  <- lm(wage~xcf*educ, data=dat)

## Compare Models
lines( dat[,'educ'], predict(preg_c), lwd=2, col=5)
lines( dat[,'educ'], predict(preg_f), lwd=2, col=6)
legend('topleft',
    legend=c('2 bins','3 bins'),
    lty=1, col=5:6, cex=.8)
title('Piecewise Regressions', font.main=1)

The model is \[\begin{eqnarray} \hat{Y}_{i} &=& \sum_{x} \left[b_{0}(x,h) + b_{1}(x,h)\hat{X}_{i} \right] \hat{D}_{i}(x,h) + e_{i}. \end{eqnarray}\] This same separation as above allows us to analytically optimize for each bin separately. I.e. we run separate regressions on the split samples. From the previous chapter on simple linear regression, we know the solutions are \[\begin{eqnarray} \hat{b}_{0}(x,h) &=& \hat{M}_{Y}(x,h)-\hat{b}_{1}\hat{M}_{X}(x,h)\\ \hat{b}_{1}(x,h) &=& \frac{\sum_{i}^{n(x,h)}(\hat{X}_{i}-\hat{M}_{X}(x,h))(\hat{Y}_{i}-\hat{M}_{Y}(x,h))}{\sum_{i}^{n(x,h)}(\hat{X}_{i}-\hat{M}_{X}(x,h))^2} = \frac{\hat{C}_{XY}(x,h)}{\hat{V}_{X}(x,h)}, \end{eqnarray}\]

Code
# See that two methods give the same predictions
# Piecewise: Course Age Bins
dat[,'xcc'] <- cut(dat[,'educ'], 2)
preg_c  <- lm(wage~xcc*educ, data=dat)
pred2 <- predict(preg_c)

## Split Sample Regressions
dat2 <- split( dat, dat[,'xcc'])
pred2B <- vector("list", length(dat2))
names(pred2B) <- names(dat2)
for (i in seq_along(dat2)) {
    reg2 <- lm(wage~educ, dat2[[i]])
    pred2B[[i]] <- predict(preg_c)
}

# Any differences?
all( pred2 - unlist(pred2B) < 1e-10 )
## [1] TRUE

Compare a simple regression to a regressogram and a piecewise regression. Examine the relationship between ‘Murder’ and ‘Urbanization’ in the USArrests dataset.

Code
xy <- USArrests[,c('Murder','UrbanPop')]
colnames(xy) <- c('y','x')

# Globally Linear
reg <- lm(y~x, data=xy)

Try to do so on your own before looking at the code below, which compares two piecewise regressions to a simple linear regression.

Code
# Diagnose Fit
#plot( fitted(reg), resid(reg), pch=16, col=grey(0,.5))
#plot( xy[,'x'], resid(reg), pch=16, col=grey(0,.5))

# Linear in 2 Pieces (subsets)
xcut2 <- cut(xy[,'x'],2)
xy_list2 <- split(xy, xcut2)
regs2 <- vector("list", length(xy_list2))
names(regs2) <- names(xy_list2)
for (i in seq_along(xy_list2)) {
    regs2[[i]] <- lm(y~x, data=xy_list2[[i]])
}

# Linear in 3 Pieces (subsets or bins)
xcut3 <- cut(xy[,'x'], seq(32,92,by=20)) # Finer Bins
xy_list3 <- split(xy, xcut3)
regs3 <- vector("list", length(xy_list3))
names(regs3) <- names(xy_list3)
for (i in seq_along(xy_list3)) {
    regs3[[i]] <- lm(y~x, data=xy_list3[[i]])
}

## Make Predictions
pred1 <- data.frame(yhat=predict(reg),
                    x=reg[['model']][,'x'])
pred1 <- pred1[order(pred1[,'x']),]

pred2 <- vector("list", length(regs2))
for (i in seq_along(regs2)) {
    pred2[[i]] <- data.frame(yhat=predict(regs2[[i]]),
               x=regs2[[i]][['model']][,'x'])
}
pred2 <- do.call(rbind,pred2)
pred2 <- pred2[order(pred2[,'x']),]

pred3 <- vector("list", length(regs3))
for (i in seq_along(regs3)) {
    pred3[[i]] <- data.frame(yhat=predict(regs3[[i]]),
               x=regs3[[i]][['model']][,'x'])
}
pred3 <- do.call(rbind,pred3)
pred3 <- pred3[order(pred3[,'x']),]

# Compare Predictions
plot(y ~ x, pch=16, col=grey(0,.5), dat=xy)
lines(yhat~x, pred1, lwd=2, col=2)
lines(yhat~x, pred2, lwd=2, col=4)
lines(yhat~x, pred3, lwd=2, col=3)
legend('topleft',
    legend=c('Globally Linear',
             'Piecewise Linear (2)',
             'Piecewise Linear (3)'),
    lty=1, col=c(2,4,3), cex=.8)

For many things, a simple linear regression, regressograms, or piecewise regression is “good enough”. Simple linear regressions struggle with nonlinear relationships but are very easy to run with a computer. Regressograms and piecewise regressions are intuitive ways to capture nonlinear relationships that are computationally efficient but have obvious problems where the bins change. Sometimes we want smoother predictions or to estimate derivatives (gradients). To cover more advanced regression methods that do those things, we will need to first learn about kernel density estimation.

Weighted Regression.

Interestingly, we can obtain the same statistics from weighted least squares regression. For some specific design point, \(x\), we can find \(\hat{b}(x, h)\) by minimizing \[\begin{eqnarray} \sum_{i}^{n}\left[ e_{i} \right]^2 \hat{D}_{i}(x,h) &=& \sum_{i}^{n}\left[ \hat{Y}_{i}- b_{0}(x,h) - b_{1}(x,h) \hat{X}_{i} \right]^2 \hat{D}_{i}(x,h) \\ &=& \sum_{i}^{n(x_{1},h)}\left[ \hat{Y}_{i}- b_{0}(x_{1},h) - b_{1}(x_{1},h) \hat{X}_{i} \right]^2 \hat{D}_{i}(x_{1},h) + ... \sum_{i}^{n(x_{L},h)}\left[ \hat{Y}_{i}- b_{0}(x_{L},h) - b_{1}(x_{L},h) \hat{X}_{i} \right]^2 \hat{D}_{i}(x_{L},h) \\ &=& \sum_{i}^{n(x,h)}\left[\hat{Y}_{i}- b_{0}\left(x,h\right) - b_{1}(x,h) \hat{X}_{i} \right]^2 \end{eqnarray}\]

We get nearly identical results if we instead use “uniform weights” with half-width \(h\), \[\begin{eqnarray} k_{U}\left( \hat{X}_{i}, x, h \right) &=& \frac{\mathbf{1}\left(\frac{|\hat{X}_{i}-x|}{h} \leq 1\right)}{2} = \frac{\mathbf{1}\left( \hat{X}_{i} \in \left[ x-h, x + h\right] \right) }{2} \\ \end{eqnarray}\] with is nearly identical to \(\hat{D}_{i}(x,h)/ 2\), but uses intervals \([]\) instead of \((]\). As such we can see that \[\begin{eqnarray} \sum_{i}^{n}\left[ e_{i} \right]^2 k_{U}\left( \hat{X}_{i}, x, h \right) &\approx& \sum_{i}^{n}\left[ e_{i} \right]^2 \hat{D}_{i}(x,h) / 2 \\ &=& \sum_{i}^{n(x,h)}\left[\hat{Y}_{i}- b_{0}\left(x,h\right) - b_{1}(x,h) \hat{X}_{i} \right]^2 / 2, \end{eqnarray}\] The constant term \(1/2\) is irrelevant to finding the optimal solution (you can check the math yourself).

Here is an example going into the details of the weights.

Code
## Generate Sample Data
x <- 1:5
y <- round( rnorm(length(x)) , 2)
y
## [1] -0.22 -1.44 -1.60 -1.51 -0.35
## plot(x,y)

## Manually Compute Estimate at x=3
h <- 1
k3_weights <- dunif( abs(x-3)/h, -1, 1) #(x >= 2)*(x <= 4)
k3_weights
## [1] 0.0 0.5 0.5 0.5 0.0

yhat_3 <- sum(k3_weights*y)
yhat_3
## [1] -2.275
Code
#library(wooldridge) # Data from before
#dat <- wage1[order(wage1[,'educ']), c('wage','educ')]

## Weighted Regression
x <- 4.5
h <- 9.036/2 #makes bin large enough to include 0: (-0.018, 9]
k <- abs(dat[,'educ']-x) /h
k_weights <- dunif(k) / (2*h)
preg_k  <- lm(wage~educ, data=dat, weights=k_weights)
predict(preg_k, newdata=data.frame(educ=x))
##        1 
## 3.911792

# Compare to Piecewise
dat[,'xcc'] <- cut(dat[,'educ'], 2)
preg_c  <- lm(wage~xcc*educ, data=dat)
predict(preg_c, newdata=data.frame(educ=x, xcc='(-0.018,9]'))
##        1 
## 3.911792

14.4 Locally Constant Regression

Consider a point \(x\) and model where values are constant around \(x\); \(\hat{Y}_{i} = b_{0}(x, h) + e_{i}\). Then notice a weighted OLS estimator with uniform kernel weights yields \[\begin{eqnarray} \label{eqn:lcls} & & \min_{b_{0}(x,h)}~ \sum_{i}^{n}\left[e_{i} \right]^2 k_{U}\left( \hat{X}_{i}, x, h \right) \\ \Rightarrow & & -2 \sum_{i}^{n}\left[\hat{Y}_{i}- b_{0}(x, h) \right] k_{U}\left(\hat{X}_{i}, x, h\right) = 0\\ \label{eqn:lcls1} \Rightarrow & & \hat{b}_{0}(x) = \frac{\sum_{i} \hat{Y}_{i} k_{U} \left( \hat{X}_{i}, x, h \right) }{ \sum_{i} k_{U}\left( \hat{X}_{i}, x, h \right) } = \sum_{i} \hat{Y}_{i} \left[ \frac{ k_{U} \left( \hat{X}_{i}, x, h \right) }{ \sum_{i} k_{U}\left( \hat{X}_{i}, x, h \right)} \right] = \sum_{i} \hat{Y}_{i} w_{i}(x, h), \end{eqnarray}\] where weights are determined from \[\begin{eqnarray} \sum_{i}^{n} k_{U} \left( \hat{X}_{i}, x, h \right) &=& \sum_{i}^{n(x,h)} k_{U} \left( \hat{X}_{i}, x, h \right) + \sum_{i}^{n - n(x,h)} 0 = \frac{n(x, h)}{2}\\ w_{i}(x, h) &=& \left[ \frac{ k_{U} \left( \hat{X}_{i}, x, h \right) }{ \sum_{i} k_{U}\left( \hat{X}_{i}, x, h \right)} \right] = \frac{\mathbf{1}\left( |\hat{X}_{i} - x| \leq h \right)}{n(x, h)} \end{eqnarray}\] So locally constant kernel regression recovers the weighted mean of \(Y_{i}\) around design point \(x\). If we use exclusive bins, then we are essentially running a regressogram. As such, the regressogram is more crude but can be estimated more easily.

See how the results are nearly identical below. Then consider different design points.

Code
#library(wooldridge) # Data from before
#dat <- wage1[order(wage1[,'educ']), c('wage','educ')]

# Piecewise Regression
dat[,'xcc'] <- cut(dat[,'educ'], 2)
preg_c  <- lm(wage~xcc*educ, data=dat)
predict(preg_c, newdata=data.frame(educ=x, xcc='(-0.018,9]'))
##        1 
## 3.911792

## Weighted Regression
x <- 4.5 # regressogram bin1 midpoint
h <- 9.036/2 #regressogram half-width
k_weights <- dunif( abs(dat[,'educ']-x)/h, -1, 1)
preg_k  <- lm(wage~educ, data=dat, weights=k_weights)
predict(preg_k, newdata=data.frame(educ=x))
##        1 
## 3.911792

Notes.

When \(n\) is small, \(\hat{b}_{U}(x, h)\) is typically estimated for each unique observed value: \(x \in \{ x_{1},...x_{n} \}\). For large datasets, you can select a subset or evenly spaced values of \(x\) for which to make predictions.

If \(\hat{X}_{i}\) represents time, then the local constant regressions is also called a moving average. We can create a weighted moving average by using a different distribution. I.e. using the Normal dnorm instead of the Uniform dunif.

The basic idea also generalizes other kernels. As such, a kernel regression using uniform weights is often called a “naive kernel regression”. Typically, kernel regressions use kernels that weight nearby observations more heavily.

Bias

The regressogram has a bias at the edges of the bins, which is addressed by the local constant regression. Still, the local constant regression has a bias near the edges of the dataset. (The kernel divides by \(2h\) assuming there is data on both sides of the design point). This can be addressed by renormalizing weights.

Here is an example going into the details of the weights.

Code
## Generate Sample Data
X <- 1:5
Y <- rnorm(length(X))
Y
## [1]  0.5356849 -0.4189607  0.7724587 -0.8031262 -0.5470494

## Manually Compute Estimate at x=5
h <- 2
k5_weights <- dunif( abs(X-5)/h,-1,1) #(X >= 3)*(X =< 7)/2
k5_weights 
## [1] 0.0 0.0 0.5 0.5 0.5

yhat_5 <- sum(k5_weights*Y)
yhat_5
## [1] -0.2888584

## Edge Correction
k5_weights_ec <- k5_weights/sum(k5_weights) 
yhat_5_ec <- sum(k5_weights_ec*Y)
yhat_5_ec
## [1] -0.1925723

We can also add a slope term to address the bias.

14.5 Local Linear Regression

A less simple case is a local linear regression which conducts a linear regression for each data point using a subsample of data around it. Consider a point \(x\) and model \(\hat{Y}_{i} = b_{0}(x,h) + b_{1}(x) \hat{X}_{i} + e_{i}\) for data near \(x\). The weighted OLS estimator with kernel weights is \[\begin{eqnarray} & & \min_{b_{0}(x, h),b_{1}(x, h)}~ \sum_{i}^{n}\left[\hat{Y}_{i}- b_{0}(x, h) - b_{1}(x,h) \hat{X}_{i} \right]^2 k_{U}\left( \hat{X}_{i}, x, h\right) \end{eqnarray}\] Deriving the optimal values \(\hat{b}_{0}(x, h)\) and \(\hat{b}_{1}(x,h)\) for \(k_{U}\) is left as a homework exercise.

Code
# ``Naive" Smoother
pred_fun <- function(x0, h){
    # Assign equal weight to observations within h distance to x0
    # 0 weight for all other observations
    ki <- abs(dat[,'educ']-x0)/h
    ki <- dunif(ki)/(2*h) ## Could change, e.g. dnorm(ki)/h
    wi <- ki/sum(ki, na.rm=T) # always sum to 1 (for edge-effects)
    # run regression with weighted data
    llls_i <- lm(wage~educ, data=dat, weights=wi)
    yhat_i <- predict(llls_i, newdata=data.frame(educ=x0))
}

X0 <- seq(0,18, by=0.5) # Design points
# Fine Bins
pred_lo1 <- vector(length=length(X0))
for(i in seq_along(X0)){
    pred_lo1[i] <- pred_fun(x0=X0[i], h=2)
}
# Course Bins
pred_lo2 <- vector(length=length(X0))
for(i in seq_along(X0)){
    pred_lo2[i] <- pred_fun(x0=X0[i], h=6)
}

# Plot
plot(wage~educ, pch=16, data=wage1, col=grey(0,.1),
    ylab='Murder Rate', xlab='Population Density')
cols <- c(rgb(.8,0,0,.5), rgb(0,0,.8,.5))
lines(X0, pred_lo1, col=cols[1], lwd=1, type='o')
lines(X0, pred_lo2, col=cols[2], lwd=1, type='o')
legend('topleft', title='Locally Linear',
    legend=c('h=2', 'h=6'),
    lty=1, col=cols, cex=.8)

Compare local linear and local constant regressions using https://shinyserv.es/shiny/kreg/, with degree \(0\) and \(1\). Also try different kernels and different datasets.

Examine the local relationship between ‘Murder’ and ‘Urbanization’ in the USArrests dataset using LLLS.

Code
xy <- USArrests[,c('Murder','UrbanPop')]
X0 <- sort(unique(xy[,'UrbanPop']))
plot(y~x, pch=16, data=xy, col=grey(0,.5),
    ylab='Murder Rate', xlab='Population Density')

pred_lo1 <- numeric(length(X0))
for (i in seq_along(X0)) {
    pred_lo1[i] <- pred_fun(X0[i], h=2, xy=xy)
}

pred_lo2 <- numeric(length(X0))
for (i in seq_along(X0)) {
    pred_lo2[i] <- pred_fun(X0[i], h=20, xy=xy)
}

cols <- c(rgb(.8,0,0,.5), rgb(0,0,.8,.5))
lines(X0, pred_lo1, col=cols[1], lwd=1, type='o')
lines(X0, pred_lo2, col=cols[2], lwd=1, type='o')
legend('topleft', title='Locally Linear',
    legend=c('h=2 ', 'h=20'),
    lty=1, col=cols, cex=.8)

LOESS

An important extension of locally linear regressions is called loess, which uses adaptive bandwidths in order to have a similar number of data points around each design point. This is especially useful when \(\hat{X}_{i}\) is not uniform.

Code
# Adaptive-width subsamples with non-uniform weights
dat <- wage1[order(wage1[,'educ']), c('wage','educ')]
plot(wage~educ, pch=16, col=grey(0,.1), data=dat)

reg_lo4 <- loess(wage~educ, data=dat, span=.4)
reg_lo8 <- loess(wage~educ, data=dat, span=.8)

cols <- hcl.colors(3,alpha=.75)[-3]
lines(dat[,'educ'], predict(reg_lo4),
    col=cols[1], type='o', pch=2)
lines(dat[,'educ'], predict(reg_lo8),
    col=cols[2], type='o', pch=2)

legend('topleft', title='Loess',
    legend=c('span=.4 ', 'span=.8'),
    lty=1, col=cols, cex=.8)