7 Intervals

7.1 Confidence Intervals

We can also estimate variability using a Confidence Interval: range your statistic varies across different samples.

Percentile Intervals.

This type of confidence interval is simply the upper and lower quantiles of the sampling distribution.

For example, consider the sample mean. We simulate the sampling distribution of the sample mean and construct a \(90\%\) confidence interval by taking the \(5^{th}\) and \(95^{th}\) percentiles of the simulated means.

Code

# Create 300 samples, each with 1000 random uniform variables
x_samples <- matrix(nrow=300, ncol=1000)
for(i in seq(1,nrow(x_samples))){
    x_samples[i,] <- runif(1000)
}
sample_means <- apply(x_samples, 1, mean) # mean for each sample (row)

# Middle 90%
mq <- quantile(sample_means, probs=c(.05,.95))
paste0('we are 90% confident that the mean is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
## [1] "we are 90% confident that the mean is between 0.48 and 0.52"

hist(sample_means,
    breaks=seq(.4,.6, by=.001), 
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Mean')
abline(v=mq)

For another example, consider the median. We now repeat the above process to estimate the median for each sample, instead of the mean.

Code

## Sample Quantiles (medians)
sample_quants <- apply(x_samples, 1, quantile, probs=0.5) #quantile for each sample (row)

# Middle 90% of estimates
mq <- quantile(sample_quants, probs=c(.05,.95))
paste0('we are 90% confident that the median is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
## [1] "we are 90% confident that the median is between 0.47 and 0.53"

hist(sample_quants,
    breaks=seq(.4,.6, by=.001),
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Median')
abline(v=mq)

Note that \(Z\%\) confidence intervals do not generally cover \(Z\%\) of the data (those types of intervals are covered later). In the examples above, notice the confidence interval for the mean differs from the confidence interval of the median, and so both cannot cover \(90\%\) of the data. The confidence interval for the mean is roughly \([0.48, 0.52]\), which theoretically covers only a \(0.52-0.48=0.04\) proportion of uniform random data, much less than the proportion 0.9.

Interval Size.

Confidence intervals shrink with more data, as averaging washes out random fluctuations. Here is the intuition for estimating the weight of an apple:

With \(n=1\) apple, your estimate depends entirely on that one draw. If it happens to be unusually large or small, your estimate can be far off.
With \(n=2\) apples, the estimate averages out their idiosyncrasies. An unusually heavy apple can be balanced by a lighter one, lowering how far off you can be. You are less likely to get two extreme values than just one.
With \(n=100\) apples, individual apples barely move the needle. The average becomes stable.

Code

X <- c(18,20,22,24) #student ages
# six possible samples of size 2
m1 <- mean( X[c(1,2)] ) #{1,2}
m2 <- mean( X[c(1,3)] ) #{1,3}
m3 <- mean( X[c(1,4)] ) #{3,4}
m4 <- mean( X[c(2,3)] ) #{2,3}
m5 <- mean( X[c(2,4)] ) #{2,4}
m6 <- mean( X[c(3,4)] ) #{3,4}
means_2 <- c(m1, m2, m3, m4, m5, m6)
sd(means_2)
## [1] 1.414214

# four possible samples of size 3
m1 <- mean( X[c(1,2,3)] ) 
m2 <- mean( X[c(1,2,4)] ) 
m3 <- mean( X[c(1,3,4)] ) 
m4 <- mean( X[c(2,3,4)] ) 
means_3 <- c(m1, m2, m3, m4)
sd(means_3)
## [1] 0.860663

Code

# Create 300 samples, each of size n
n <- 10000
x_samples <- matrix(nrow=300, ncol=n)
for(i in seq(1,nrow(x_samples))){
    x_samples[i,] <- runif(1000)
}
# Compute means for each row (for each sample)
sample_means <- apply(x_samples, 1, mean)
mq <- quantile(sample_means, probs=c(.05,.95))
paste0('we are 90% confident that the mean is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
hist(sample_means,
    breaks=seq(.4,.6, by=.001), 
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Mean (larger n)')
abline(v=mq)

Note that both resampling methods provide imperfect estimates, and can give different numbers. Jackknife resamples are systematically less variable than they should be and sample \(n-1\) instead of \(n\). Bootstrap resamples have the right \(n\) but often have duplicated data. Until you know more, a conservative approach is to take the larger estimate (often the bootstrap). That is also good advice when considering theoretically derived confidence intervals too.

Code

sample_dat <- USArrests[,'Murder']
sample_mean <- mean(sample_dat)
sample_mean
## [1] 7.788

# Jackknife Distribution
n <- length(sample_dat)
jackknife_means <- vector(length=n)
for(i in seq_along(jackknife_means)){
    dat_noti <- sample_dat[-i]
    mean_noti <- mean(dat_noti)
    jackknife_means[i] <- mean_noti
}

# Bootstrap Distribution
set.seed(1) # to be replicable
bootstrap_means <- vector(length=9999)
for(b in seq_along(bootstrap_means)){
    dat_id <- seq(1,n)
    boot_id <- sample(dat_id , replace=T)
    dat_b  <- sample_dat[boot_id] # c.f. jackknife
    mean_b <- mean(dat_b)
    bootstrap_means[b] <-mean_b
}

# Jack CI
jack_ci <- quantile(jackknife_means, probs=c(.025, .975))
jack_ci
##     2.5%    97.5% 
## 7.621582 7.904082

# Boot CI
boot_ci <- quantile(bootstrap_means, probs=c(.025, .975))
boot_ci
##   2.5%  97.5% 
## 6.6039 8.9420

# more conservative estimate
ci <- boot_ci

7.2 Hypothesis Testing

In this section, we test hypotheses using data-driven methods that assume much less about the data generating process. There are two main ways to conduct a hypothesis test to do so: inverting a confidence interval and imposing the null. The first treats the distribution of estimates directly; the second explicitly enforces the null hypothesis to evaluate how unusual the observed statistic is. Both approaches rely on the bootstrap: resampling the data to approximate sampling variability. The most typical case is hypothesizing about about the mean.

Invert a CI.

One main way to conduct hypothesis tests is to examine whether a confidence interval contains a hypothesized value. We then use this decision rule

reject the null if value falls outside of the interval
fail to reject the null if value falls inside of the interval

We typically use a \(95\%\) confidence interval to create a rejection region.

Note

For example, suppose you hypothesize the mean is \(9\). You then construct a bootstrap distribution with \(95\%\) confidence interval, and find your hypothesized value falls outside of the confidence interval. Then, after accounting for sampling variability (which you estimate), it still seems extremely unlikely that the theoretical mean actually equals \(9\), so you reject that that hypothesis. (If the theoretical value landed in the interval, you would “fail to reject” the theoretical mean equals \(9\).)

Code

hist(bootstrap_means, breaks=25,
    border=NA,
    main='',
    xlab='Bootstrap Samples')
# CI
ci_95 <- quantile(bootstrap_means, probs=c(.025, .975))
abline(v=ci_95, lwd=2)
# H0: mean=9
abline(v=9, col=2, lwd=2)

Tip

The above procedure also generalizes to many other statistics. Perhaps the most informative additional statistics for spread or shape. E.g., you can conduct hypothesis tests for sd and IQR, or skew and kurtosis.

Code

# Bootstrap Distribution for SD
sd_obs <- sd(sample_dat)
bootstrap_sd <- vector(length=999)
for(b in seq_along(bootstrap_sd)){
    x_b <- sample(sample_dat, replace=T)
    sd_b <- sd(x_b)
    bootstrap_sd[b] <- sd_b
}

# Test for SD Differences (Invert CI)
sd_null <- 3.6
hist(bootstrap_sd, freq=F,
    border=NA, xlab='Bootstrap', font.main=1,
    main='Standard Deviations (Invert CI)')
sd_ci <- quantile(bootstrap_sd, probs=c(0.25,.975) )
abline(v=sd_ci, lwd=2)
abline(v=sd_null, lwd=2, col=2)

To better your understanding, try redoing the above for any function (such as IQR(x_b)/median(x_b))

Impose the Null.

We can also compute a null distribution: the sampling distribution of the statistic under the null hypothesis (assuming your null hypothesis was true). We use the bootstrap to loop through a large number of “resamples”. In each iteration of the loop, we impose the null hypothesis and re-estimate the statistic of interest. We then calculate the range of the statistic across all resamples and compare how extreme the original value we observed is.

Note

For example, suppose you hypothesize the mean is \(9\). You then construct a \(95\%\) confidence interval around the null bootstrap distribution (resamples centered around \(9\)). If your sample mean falls outside of that interval, then even after accounting for sampling variability (which you estimate), it seems extremely unlikely that the theoretical mean actually equals \(9\), so you reject that that hypothesis. (If the sample mean landed in the interval, you would “fail to reject” the theoretical mean equals \(9\).)

Code

sample_dat <- USArrests[,'Murder']
sample_mean <- mean(sample_dat)

# Bootstrap NULL: mean=9
# Bootstrap shift: center each bootstrap resample so that the distribution satisfies the null hypothesis on average.
set.seed(1)
mu <- 9
bootstrap_means_null <- vector(length=999)
for(b in seq_along(bootstrap_means_null)){
    dat_b <- sample(sample_dat, replace=T) 
    mean_b <- mean(dat_b) + (mu - sample_mean) # impose the null via Bootstrap shift
    bootstrap_means_null[b] <- mean_b
}
hist(bootstrap_means_null, breaks=25, border=NA,
    main='',
    xlab='Null Bootstrap Samples')
ci_95 <- quantile(bootstrap_means_null, probs=c(.025, .975)) # critical region
abline(v=ci_95, lwd=2)
abline(v=sample_mean, lwd=2, col=4)

7.3 Probability Theory

CI Coverage.

Often, a \(Z\%\) confidence interval means that \(Z\%\) of the intervals we generate will contain the true mean. E.g., suppose that we repeatedly sample data and construct \(95\%\) bootstrap confidence interval for the mean, then we expect that \(95\%\) of our constructed confidence intervals contain the theoretical population mean. Given the sampling distribution is approximately normally, confidence intervals are symmetric. For the sample mean \(M\), we can construct the interval \([M - E, M + E]\), where \(E\) is some “margin of error” on either side of \(M\). A coverage level of \(1-\alpha\) means \(Prob( M - E < \mu < M + E)=1-\alpha\). I.e., if the same sampling procedure were repeated \(100\) times from the same population, approximately \(95\) of the resulting intervals would be expected to contain the true population mean.¹ Note that a \(95\%\) coverage level does not imply a \(95\%\) probability that the true parameter lies within a particular calculated interval. E.g., if you compute \(\hat{M}=9\) for your particular sample, a coverage level of \(1-\alpha=95\%\) does not mean \(Prob(9 - E < \mu < 9 + E)=95\%\). The interval you computed either contains the true mean or it does not.

Tip

Given the sample size, \(n\), is large enough for the sample mean to be approximately normally distributed, what confidence interval satisfies the following: the theoretical mean \(\mu\) is inside of the interval with probability \(95%\) (i.e., for \((1 - 0.05)%\) of samples)?

For a fixed sample size \(n\), there is a trade-off between precision: the width of a confidence interval, and accuracy: the probability that a confidence interval contains the theoretical value.

Code

# Confidence Interval for each sample
xq <- apply(x_samples, 1, function(r){ #theoretical se's 
    mean(r) + c(-1,1)*sd(r)/sqrt(length(r))
})
# First 3 interval estimates
xq[, c(1,2,3)]
##           [,1]      [,2]      [,3]
## [1,] 0.4791603 0.5069904 0.4933129
## [2,] 0.4973076 0.5254103 0.5119710

# Explicit calculation
mu_true <- 0.5 # theoretical result for uniform samples
# Logical vector: whether the true mean is in each CI
covered <- mu_true >= xq[1, ] & mu_true <= xq[2, ]
# Empirical coverage rate
coverage_rate <- mean(covered)
cat(sprintf("Estimated coverage probability: %.2f%%\n", 100 * coverage_rate))
## Estimated coverage probability: 69.00%

# Theoretically: [-1 sd, +1 sd] has 2/3 coverage
# Change to [-2 sd, +2 sd] to see Precision-Accuracy tradeoff.

Code

# Visualize first 100 confidence intervals
n <- 100
plot.new()
plot.window(xlim = range(xq), ylim = c(0, n))
for (i in seq(1,n) ) {
  col_i <- if (covered[i]) rgb(0, 0, 0, 0.3) else rgb(1, 0, 0, 0.5)
  segments(xq[1, i], i, xq[2, i], i, col = col_i, lwd = 2)
}
abline(v = mu_true, col = "blue", lwd = 2)
axis(1)
title("Visualizing CI Coverage (Red = Missed)", font.main=1)

7.4 Further Reading

See

Notice that \(Prob( M - E < \mu < M + E) = Prob( - E < \mu - M < + E) = Prob( E + \mu > M > \mu - E)\). So if the interval \([\mu - 10, \mu + 10]\) contains \(95\%\) of all \(M\), then the interval \([M-10, M+10]\) will also contain \(\mu\) in \(95\%\) of the samples because whenever \(M\) is within \(10\) of \(\mu\), the value \(\mu\) is also within \(10\) of \(M\). But for any particular sample, the interval \([\hat{M}-10, \hat{M}+10]\) either does or does not contain \(\mu\).↩︎