6  (Re)Sampling


A sample is a subset of the population. A simple random sample is a sample where each possible sample of size n has the same probability of being selected.

Code
#All possible samples of two from {1,2,3,4}
#{1,2} {1,3}, {3,4}
#{2,3} {2,4}
#{3,4}
choose(4,2)
## [1] 6

# Simple random sample (no duplicates, equal probability)
sample(1:4, 2, replace=F)
## [1] 2 3

Often, we think of the population as being infinitely large. This is an approximation that makes mathematical and computational work much simpler.

Code
#All possible samples of two from an enormous bag of numbers {1,2,3,4}
#{1,1} {1,2} {1,3}, {3,4}
#{2,2} {2,3} {2,4}
#{3,3} {3,4}
#{4,4}

# Simple random sample (duplicates, equal probability)
sample(1:4, 2, replace=T)
## [1] 2 4

Intuition for infinite populations: imagine drawing names from a giant urn. If the urn has only \(10\) names, then removing one name slightly changes the composition of the urn, and the probabilities shift for the next name you draw. Now imagine the urn has \(100\) billion names, so that removing one makes no noticeable difference. We can pretend the composition never changes: each draw is essentially identical and independent (iid). We can even guarantee that be putting any names drawn back into the urn (sampling with replacement).

6.1 Sample Distributions

The sampling distribution of a statistic shows us how much a statistic varies from sample to sample.

For example, the sampling distribution of the mean shows how the sample mean varies from sample to sample to sample. The sampling distribution of mean can also be referred to as the probability distribution of the sample mean.

Given ages for population of \(4\) students, compute the sampling distribution for the mean with samples of \(n=2\).

Code
X <- c(18,20,22,24) # Ages for student population
# six possible samples
m1 <- mean( X[c(1,2)] ) #{1,2}
m2 <- mean( X[c(1,3)] ) #{1,3}
m3 <- mean( X[c(1,4)] ) #{3,4}
m4 <- mean( X[c(2,3)] ) #{2,3}
m5 <- mean( X[c(2,4)] ) #{2,4}
m6 <- mean( X[c(3,4)] ) #{3,4}
# sampling distribution
sample_means <- c(m1, m2, m3, m4, m5, m6)
hist(sample_means,
    freq=F, breaks=100,
    main='', border=F)

Now compute the sampling distribution for the median with samples of \(n=3\).

Code
# Three Sample Example from infinite population
x1 <- runif(100)
m1 <- mean(x1)
x2 <- runif(100)
m2 <- mean(x2)
x3 <- runif(100)
m3 <- mean(x3)
sample_means <- c(m1, m2, m3)
sample_means
## [1] 0.5020091 0.4857290 0.5505686

# An Equivalent Approach: fill vector in a loop
sample_means <- vector(length=3)
for(i in seq(sample_means)){
    x <- runif(100)
    m <- mean(x)
    sample_means[i] <- m
}
sample_means
## [1] 0.5030591 0.5059568 0.5393621
Code
# Three Sample Example w/ Visual
par(mfrow=c(1,3))
for(b in 1:3){
    x <- runif(100) 
    m <-  mean(x)
    hist(x,
        breaks=seq(0,1,by=.1), #for comparability
        freq=F, main=NA, border=NA)
    abline(v=m, col=2, lwd=2)
    title(paste0('mean= ', round(m,2)),  font.main=1)
}

Examine the sampling distribution of the mean

Code
# Many sample example
sample_means <- vector(length=500)
for(i in seq_along(sample_means)){
    x <- runif(1000)
    m <- mean(x)
    sample_means[i] <- m
}
hist(sample_means, 
    breaks=seq(0.45,0.55,by=.001),
    border=NA, freq=F,
    col=2, font.main=1, 
    xlab=expression(hat(M)),
    main='Sampling Distribution of the mean')

In this figure, you see two the most profound results known in statistics

  • Law of Large Numbers: the sample mean is centered around the true mean.
  • Central Limit Theorem: the sampling distribution of the statistic is approximately Normal.

Law of Large Numbers.

There are different variants of the Law of Large Numbers (LLN), but they all say some version of “the sample mean is centered around the true mean”.

Code
# LLLN example
m_LLLN <- mean(sample_means)
round(m_LLLN, 3)
## [1] 0.5

Central Limit Theorem.

There are different variants of the central limit theorem (CLT), but they all say some version of “the sampling distribution of a statistic is approximately normal”. For example, the sampling distribution of the mean, shown above, is approximately normal.

Code
hist(sample_means,
    breaks=seq(0.45,0.55,by=.001),
    border=NA, freq=F,
    col=2, font.main=1,
    xlab=expression(hat(M)),
    main='Sampling Distribution of the mean')
    
## Approximately normal?
mu <- mean(sample_means)
mu_sd <- sd(sample_means)
x <- seq(0.1, 0.9, by=0.001)
fx <- dnorm(x, mu, mu_sd)
lines(x, fx, col='red')

For an example with another statistic, let’s the sampling distribution of the standard deviation.

Code
# CLT example of the "sd" statistic
sample_sds <- vector(length=1000)
for(i in seq_along(sample_sds)){
    x <- runif(100) # same distribution
    s <- sd(x) # different statistic
    sample_sds[i] <- s
}
hist(sample_sds,
    breaks=seq(0.2,0.4,by=.01),
    border=NA, freq=F,
    col=4, font.main=1,
    xlab=expression(hat(S)),
    main='Sampling Distribution of the sd')

## Approximately normal?
mu <- mean(sample_sds)
mu_sd <- sd(sample_sds)
x <- seq(0.1, 0.9, by=0.001)
fx <- dnorm(x, mu, mu_sd)
lines(x, fx, col='blue')

Code

# Try another function, such as
my_function <- function(x){ diff(range(exp(x))) }

# try another random variable, such as rexp(100) instead of runif(100)

It is beyond this class to prove this result mathematically, but you should know that not all sampling distributions are standard normal. The CLT approximation is better for “large \(n\)” datasets with “well behaved” variances. The CLT does not apply to “extreme” statistics.

For example of “extreme” statistics, examine the sampling distribution of min, median, max statistics.

Code
# Create 300 samples, each with 1000 random uniform variables
x_samples <- matrix(nrow=300, ncol=1000)
for(i in seq(1,nrow(x_samples))){
    x_samples[i,] <- runif(1000)
}
# Each row is a new sample
length(x_samples[1,])
## [1] 1000

# Compute min, median, and max for each sample
x_mins <- apply(x_samples, 1, quantile, probs=0)
x_meds <- apply(x_samples, 1, quantile, probs=.5)
x_maxs <- apply(x_samples, 1, quantile, probs=1)

# Plot the sampling distributions of min, median, and max
# Median looks normal. Maximum and Minumum do not!
par(mfrow=c(1,3))
hist(x_mins, breaks=100, main='Min', font.main=1,
    xlab=expression(hat(Q)[0]), border=NA, freq=F)
hist(x_meds, breaks=100, main='Med', font.main=1,
    xlab=expression(hat(Q)[0.5]), border=NA, freq=F)
hist(x_maxs, breaks=100, main='Max', font.main=1,
    xlab=expression(hat(Q)[1]), border=NA, freq=F)
title('Sampling Distributions', outer=T, line=-1, adj=0)

Here is an example where variance is not “well behaved” .

Code
sample_means <- vector(length=999)
for(i in seq_along(sample_means)){
    x <- rcauchy(1000,0,10)
    m <- mean(x)
    sample_means[i] <- m
} )
hist(sample_means, breaks=100,
    main='',
    border=NA, freq=F) # Tails look too "fat"

6.2 Resampling

Often, we only have one sample. How then can we estimate the sampling distribution of a statistic?

Code
sample_dat <- USArrests[,'Murder']
sample_mean <- mean(sample_dat)
sample_mean
## [1] 7.788

We can “resample” our data. Hesterberg (2015) provides a nice illustration of the idea. The two most basic versions are the jackknife and the bootstrap, which are discussed below.

Note that we do not use the mean of the resampled statistics as a replacement for the original estimate. This is because the resampled distributions are centered at the observed statistic, not the population parameter. (The bootstrapped mean is centered at the sample mean, for example, not the population mean.) This means that we cannot use resampling to improve on \(\hat{M}\). We use resampling to estimate sampling variability.

Jackknife Distribution.

Here, we compute all “leave-one-out” estimates. Specifically, for a dataset with \(n\) observations, the jackknife uses \(n-1\) observations other than \(i\) for each unique subsample. Taking the mean, for example, we have

  • jackknifed estimates: \(\hat{M}_{i}^{\text{jack}}=\frac{1}{n-1} \sum_{j \neq i}^{n-1} \hat{X}_{j}\).
  • mean of the jackknife: \(\hat{M}^{\text{jack}}=\frac{1}{n} \sum_{i}^{n} \hat{M}_{i}^{\text{jack}}\).
  • standard deviation of the jackknife: \(\hat{S}^{\text{jack}}= \sqrt{ \frac{1}{n} \sum_{i}^{n} \left[\hat{M}_{i}^{\text{jack}} - \hat{M}^{\text{jack}} \right]^2 }\).
Code
sample_dat <- USArrests[,'Murder']
sample_mean <- mean(sample_dat)

# Jackknife Estimates
n <- length(sample_dat)
jackknife_means <- vector(length=n)
for(i in seq_along(jackknife_means)){
    dat_noti <- sample_dat[-i]
    mean_noti <- mean(dat_noti)
    jackknife_means[i] <- mean_noti
}
hist(jackknife_means, breaks=25,
    border=NA, freq=F,
    main='', xlab=expression(hat(M)[-i]))
abline(v=sample_mean, col='red', lty=2)

Bootstrap Distribution.

Here, we draw \(n\) observations with replacement from the original data to create a bootstrap sample and calculate a statistic. Each bootstrap sample \(b=1...B\) uses a random subset of observations to compute a statistic. We repeat that many times, say \(B=9999\), to estimate the sampling distribution. Taking the mean, for example, we have

  • bootstrapped estimate: \(\hat{M}_{b}^{\text{boot}}= \frac{1}{n} \sum_{i=1}^{n} \hat{X}_{i}^{(b)}\), for resampled data \(\hat{X}_{i}^{(b)}\)
  • mean of the bootstrap: \(\hat{M}^{\text{boot}}= \frac{1}{B} \sum_{b} \hat{M}_{b}^{\text{boot}}\).
  • standard deviation of the bootstrap: \(\hat{S}^{\text{boot}}= \sqrt{ \frac{1}{B} \sum_{b=1}^{B} \left[\hat{M}_{b}^{\text{boot}} - \hat{M}^{\text{boot}} \right]^2}\).
Code
# Bootstrap estimates
bootstrap_means <- vector(length=9999)
for(b in seq_along(bootstrap_means)){
    boot_id <- sample(1:n, replace=T)
    dat_b  <- sample_dat[boot_id] # c.f. jackknife
    mean_b <- mean(dat_b)
    bootstrap_means[b] <-mean_b
}

hist(bootstrap_means, breaks=25,
    border=NA, freq=F,
    main='', xlab=expression(hat(M)[b]))
abline(v=sample_mean, col='red', lty=2)

Why does this work? The sample: \(\{\hat{X}_{1}, \hat{X}_{2}, ... \hat{X}_{n}\}\) is drawn from a CDF \(F\). Each bootstrap sample: \(\{\hat{X}_{1}^{(b)}, \hat{X}_{2}^{(b)}, ... \hat{X}_{n}^{(b)}\}\) is drawn from the ECDF \(\hat{F}\). With \(\hat{F} \approx F\), each bootstrap sample is approximately a random sample. So when we compute a statistic on each bootstrap sample, we approximate the sampling distribution of the statistic.

Code
# theoretical probabilities
x <- c(0,1)
x_probs <- c(1/4, 3/4)
# sample draws
coin_sample <- sample(x, prob=x_probs, 1000, replace=T)
Fhat <- ecdf(coin_sample)

x_probs_boot <- c(Fhat(0), 1-Fhat(0))
x_probs_boot # approximately the theoretical value
## [1] 0.248 0.752
coin_resample <- sample(x, prob=x_probs_boot, 999, replace=T)
# any draw from here is almost the same as the original process

Using either the bootstrap or jackknife distribution, we can estimate variability via the Standard Error: the standard deviation of your statistic across different samples (square rooted). In either case, this differs from the standard deviation of the data within your sample.

Code
sd(sample_dat) # standard deviation
## [1] 4.35551
sd(bootstrap_means) # standard error
## [1] 0.6137248

6.3 Intervals

We can also estimate variability using a Confidence Interval: range your statistic varies across different samples.

Percentile Intervals.

This type of confidence interval is simply the upper and lower quantiles of the sampling distribution.

For example, consider the sample mean. We simulate the sampling distribution of the sample mean and construct a \(90\%\) confidence interval by taking the \(5^{th}\) and \(95^{th}\) percentiles of the simulated means.

Code
# Create 300 samples, each with 1000 random uniform variables
x_samples <- matrix(nrow=300, ncol=1000)
for(i in seq(1,nrow(x_samples))){
    x_samples[i,] <- runif(1000)
}

# Middle 90%
mq <- quantile(sample_means, probs=c(.05,.95))
paste0('we are 90% confident that the mean is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
## [1] "we are 90% confident that the mean is between 0.48 and 0.51"

hist(sample_means,
    breaks=seq(.4,.6, by=.001), 
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Mean')
abline(v=mq)

For another example, consider the median. We now repeat the above process to estimate the median for each sample, instead of the mean.

Code
## Sample Quantiles (medians)
sample_quants <- apply(x_samples, 1, quantile, probs=0.5)

# Middle 90% of estimates
mq <- quantile(sample_quants, probs=c(.05,.95))
paste0('we are 90% confident that the median is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
## [1] "we are 90% confident that the median is between 0.48 and 0.53"

hist(sample_quants,
    breaks=seq(.4,.6, by=.001),
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Median')
abline(v=mq)

Note that \(Z\%\) confidence intervals do not generally cover \(Z\%\) of the data (those types of intervals are covered later). In the examples above, notice the confidence interval for the mean differs from the confidence interval of the median, and so both cannot cover \(90\%\) of the data. The confidence interval for the mean is roughly \([0.48, 0.52]\), which theoretically covers only a \(0.52-0.48=0.04\) proportion of uniform random data, much less than \(90%\).

Often, a \(Z\%\) confidence interval means that \(Z\%\) of the intervals we generate will contain the true mean. E.g., suppose that we repeatedly sample data and construct \(95\%\) bootstrap confidence interval for the mean, then we expect that \(95\%\) of our constructed confidence intervals contain the theoretical population mean.1

Interval Size.

Confidence intervals shrink with more data, as averaging washes out random fluctuations. Here is the intuition for estimating the weight of an apple:

  • With \(n=1\) apple, your estimate depends entirely on that one draw. If it happens to be unusually large or small, your estimate can be far off.
  • With \(n=2\) apples, the estimate averages out their idiosyncrasies. An unusually heavy apple can be balanced by a lighter one, lowering how far off you can be. You are less likely to get two extreme values than just one.
  • With \(n=100\) apples, individual apples barely move the needle. The average becomes stable.
Code
X <- c(18,20,22,24) #student ages
# six possible samples of size 2
m1 <- mean( X[c(1,2)] ) #{1,2}
m2 <- mean( X[c(1,3)] ) #{1,3}
m3 <- mean( X[c(1,4)] ) #{3,4}
m4 <- mean( X[c(2,3)] ) #{2,3}
m5 <- mean( X[c(2,4)] ) #{2,4}
m6 <- mean( X[c(3,4)] ) #{3,4}
means_2 <- c(m1, m2, m3, m4, m5, m6)
sd(means_2)
## [1] 1.414214

# four possible samples of size 3
m1 <- mean( X[c(1,2,3)] ) 
m2 <- mean( X[c(1,2,4)] ) 
m3 <- mean( X[c(1,3,4)] ) 
m4 <- mean( X[c(2,3,4)] ) 
means_3 <- c(m1, m2, m3, m4)
sd(means_3)
## [1] 0.860663
Code
# Create 300 samples, each of size n
n <- 10000
x_samples <- matrix(nrow=300, ncol=n)
for(i in seq(1,nrow(x_samples))){
    x_samples[i,] <- runif(1000)
}
# Compute means for each row (for each sample)
sample_means <- apply(x_samples, 1, mean)
mq <- quantile(sample_means, probs=c(.05,.95))
paste0('we are 90% confident that the mean is between ', 
    round(mq[1],2), ' and ', round(mq[2],2) )
hist(sample_means,
    breaks=seq(.4,.6, by=.001), 
    border=NA, freq=F,
    col=rgb(0,0,0,.25), font.main=1,
    main='90% Confidence Interval for the Mean (larger n)')
abline(v=mq)

Note that both resampling methods provide imperfect estimates, and can give different numbers. Jackknife resamples are systematically less variable than they should be and sample \(n-1\) instead of \(n\). Bootstrap resamples have the right \(n\) but often have duplicated data. Until you know more, a conservative approach is to take the larger estimate (often the bootstrap). That is also good advice when considering theoretically derived confidence intervals too.

Code
# Boot CI
boot_ci <- quantile(bootstrap_means, probs=c(.025, .975))
boot_ci
##   2.5%  97.5% 
## 6.6139 8.9980

# Jack CI
jack_ci <- quantile(jackknife_means, probs=c(.025, .975))
jack_ci
##     2.5%    97.5% 
## 7.621582 7.904082

# more conservative estimate
ci <- boot_ci

6.4 Probability Theory

Means.

The LLN follows from a famous theoretical result in statistics, Linearity of Expectations: the expected value of a sum of random variables equals the sum of their individual expected values. To be concrete, suppose we take \(n\) random variables, each one denoted as \(X_{i}\). Then, for constants \(a,b,1/n\), we have \[\begin{eqnarray} \mathbb{E}[a X_{1}+ b X_{2}] &=& a \mathbb{E}[X_{1}]+ b \mathbb{E}[X_{2}]\\ \mathbb{E}\left[M\right] &=& \mathbb{E}\left[ \sum_{i=1}^{n} X_{i}/n \right] = \sum_{i=1}^{n} \mathbb{E}[X_{i}]/n \end{eqnarray}\] Assuming each data point has identical means; \(\mathbb{E}[X_{i}]=\mu\), the expected value of the sample average is the mean; \(\mathbb{E}\left[M\right] = \sum_{i=1}^{n} \mu/n = \mu\).

Note that the estimator \(M\) differs from the particular estimate you calculated for your sample, \(\hat{M}\). For example, consider flipping a coin three times: \(M\) corresponds to a theoretical value before you flip the coins and \(\hat{M}\) corresponds to a specific value after you flip the coins.

For example, consider a coin flip with Heads \(X_{i}=1\) having probability \(p\) and Tails \(X_{i}=0\) having probability \(1-p\). First notice that \(\mathbb{E}[X_{i}]=p\). Then notice we can first \[\begin{align*} \mathbb{E}[X_{1}+X_{2}] &= [1+1][p \times p] + [1+0][p \times (1-p)] + [0+1][(1-p) \times p] + [0+0][(1-p) \times (1-p)] & \text{``HH + HT + TH + TT''} \\ &= [1][p \times p] + [1][p \times (1-p)] + [0][(1-p) \times p] + [0][(1-p) \times (1-p)] & \text{first outcomes times prob.} \\ &+ [1][p \times p] + [0][p \times (1-p)] + [1][(1-p) \times p] + [0][(1-p) \times (1-p)] & \text{+second outcomes times prob.} \\ &= [1][p \times p] + [1][p \times (1-p)] + [1][p \times p] + [1][(1-p) \times p] & \text{drop zeros}\\ &= 1p (p + [1-p]) + 1p (p + [1-p]) = p+p & \text{algebra}\\ &= \mathbb{E}[X_{1}] + \mathbb{E}[X_{2}] . \end{align*}\] The theoretical mean is \(\mathbb{E}[\frac{X_{1}+X_{2}}{2}]=\frac{p+p}{2}=p\).

Variances.

Another famous theoretical result in statistics is that if we have independent and identical data (i.e., that each random variable \(X_{i}\) has the same mean \(\mu\) and same variance \(\sigma^2\) and is drawn without any dependence on the previous draws), then the standard error of the sample mean is “root n” proportional to the theoretical standard error. Intuitively, this follows from thinking of the variance as a type of mean (the theoretical mean squared deviation from \(\mu\)). \[\begin{eqnarray} \mathbb{V}\left( M \right) &=& \mathbb{V}\left( \frac{\sum_{i}^{n} X_{i}}{n} \right) = \sum_{i}^{n} \mathbb{V}\left(\frac{X_{i}}{n}\right) = \sum_{i}^{n} \frac{\sigma^2}{n^2} = \sigma^2/n\\ \mathbb{s}\left(M\right) &=& \sqrt{\mathbb{V}\left( M \right) } = \sqrt{\sigma^2/n} = \sigma/\sqrt{n}. \end{eqnarray}\]

Note that the standard deviation refers to variance within a single sample, and is hence different from the standard error. Nonetheless, it can be used to estimate the variability of the mean: we can estimate \(\mathbb{s}\left(M\right)\) with \(\hat{S}/\sqrt{n}\), where \(\hat{S}\) is the standard deviation of the sample. This estimate is often a little different from than the bootstrap estimate, as it is based on idealistic theoretical assumptions whereas the bootstrap estimate is driven by data that are often not ideal.

Code
boot_se <- sd(bootstrap_means)

theory_se <- sd(sample_dat)/sqrt(n)

c(boot_se, theory_se)
## [1] 0.6137248 0.6159621

Also note that each additional data point you have provides more information, which ultimately decreases the standard error of your estimates. This is why statisticians will often recommend that you to get more data. However, the improvement in the standard error increases at a diminishing rate. In economics, this is known as diminishing returns and why economists may recommend you do not get more data.

Code
B <- 1000 # number of bootstrap samples
Nseq <- seq(1,100, by=1) # different sample sizes

SE <- vector(length=length(Nseq))
for(n in Nseq){
    sample_statistics_n <- vector(length=B)
    for(b in seq(1,B)){
        x_b <- rnorm(n) # Sample of size n
        x_stat_b <- quantile(x_b, probs=.4) # Stat of interest
        sample_statistics_n[b] <- x_stat_b
    }
    se_n <- sd(sample_statistics_n) # How much the stat varies across samples
    SE[n] <- se_n
}


plot(Nseq, SE, pch=16, col=grey(0,.5),
    ylab='standard error', xlab='sample size')

Code

#plot(Nseq[-1], abs(diff(SE)), pch=16, col=grey(0,.5),
#    main='Marginal Gain', font.main=1,
#    ylab='decrease in standard error', xlab='sample size')

Shape.

Sometimes, the sampling distribution is approximately normal (according to the CLT). In this case, you can use a standard error and the normal distribution to get a confidence interval.

Code
# Standard Errors
sd_theory <- sd(sample_dat1)/sqrt(length(sample_dat1))
## Normal CI
spread_theory <- qnorm(c(0.025, 0.975))
ci <- mean(sample_dat1) + spread_theory*sd_theory

CI Coverage.

Given the sample size, \(n\), is large enough for the mean to be approximately normally distributed, we can construct a symmetric confidence interval \([M - E, M + E]\), where \(E\) is some “margin of error” on either side of the mean \(M\). A coverage level of \(1-\alpha\) means \(Prob( M - E < \mu < M + E)=1-\alpha\). I.e., if the same sampling procedure were repeated \(100\) times from the same population, approximately \(95\) of the resulting intervals would be expected to contain the true population mean.2 Note that a \(95\%\) coverage level does not imply a \(95%\) probability that the true parameter lies within a particular calculated interval. E.g., if you compute \(\hat{M}=9\) for your particular sample, a coverage level of \(1-\alpha=95\%\) does not mean \(Prob(9 - E < \mu < 9 + E)=95\%\).

Given the sample size, \(n\), is large enough for the sample mean to be approximately normally distributed, what confidence interval satisfies the following: the theoretical mean \(\mu\) is inside of the interval with probability \(95%\) (i.e., for \((1 - 0.05)%\) of samples)?

For a fixed sample size \(n\), there is a trade-off between precision: the width of a confidence interval, and accuracy: the probability that a confidence interval contains the theoretical value.

Code
# Confidence Interval for each sample
xq <- apply(x_samples, 1, function(r){ #theoretical se's 
    mean(r) + c(-1,1)*sd(r)/sqrt(length(r))
})
# First 4 interval estimates
xq[,1:4]
##           [,1]      [,2]      [,3]      [,4]
## [1,] 0.4990344 0.4984030 0.4841576 0.4756585
## [2,] 0.5171785 0.5164427 0.5020911 0.4938873

# Explicit calculation
mu_true <- 0.5 # theoretical result for x_samples ~ uniform
# Logical vector: whether the true mean is in each CI
covered <- mu_true >= xq[1, ] & mu_true <= xq[2, ]
# Empirical coverage rate
coverage_rate <- mean(covered)
cat(sprintf("Estimated coverage probability: %.2f%%\n", 100 * coverage_rate))
## Estimated coverage probability: 70.33%

# Theoretically: [-1 sd, +1 sd] has 2/3 coverage
# Change to [-2 sd, +2 sd] to see Precision-Accuracy tradeoff.
Code
# Visualize first 100 confidence intervals
N <- 100
plot.new()
plot.window(xlim = range(xq), ylim = c(0, N))
for (i in 1:N) {
  col_i <- if (covered[i]) rgb(0, 0, 0, 0.3) else rgb(1, 0, 0, 0.5)
  segments(xq[1, i], i, xq[2, i], i, col = col_i, lwd = 2)
}
abline(v = mu_true, col = "blue", lwd = 2)
axis(1)
title("Visualizing CI Coverage (Red = Missed)")

6.5 Further Reading

See

See


  1. Note also that a \(95\%\) confidence interval does not imply a \(95\%\) probability that the true mean lies within the particular confidence interval you calculated for a particular sample. The interval computed from a given sample either contains the true mean or it does not.↩︎

  2. Notice that \(Prob( M - E < \mu < M + E) = Prob( - E < \mu - M < + E) = Prob( E + mu > M > mu - E)\). So if the interval \([\mu - 10, \mu + 10]\) contains \(95%\) of all \(M\), then the interval \([M-10, M+10]\) will also contain \(\mu\) \(95%\) of the time because whenever \(M\) is within \(10\) of \(\mu\), \(\mu\) is also within \(10\) of \(M\).↩︎