8 Probability Theory
You were already introduced to this with https://jadamso.github.io/Rbooks/data.html#random-variables and probability distributions. In this section, we will dig a little deeper theoretically into the statistics we are most likely to use in practice.
8.1 Mean and Variance
The mean and variance are probably the two most basic statistics we might compute, and are often used. To understand them better, we separately analyze how they are computed for discrete and continuous random variables.
Discrete.
If the sample space is discrete, we can compute the theoretical mean (or expected value) as \[ \mu = \sum_{i} x_{i} Prob(X=x_{i}), \] where \(Prob(X=x_{i})\) is the probability the random variable \(X\) takes the particular value \(x_{i}\). Similarly, we can compute the theoretical variance as \[ \sigma^2 = \sum_{i} [x_{i} - \mu]^2 Prob(X=x_{i}), \]
For example, consider an unfair coin with a \(.75\) probability of heads (\(x_{i}=1\)) and a \(.25\) probability of tails (\(x_{i}=0\)) has a theoretical mean of \[ \mu = 1\times.75 + 0 \times .25 = .75 \] and a theoretical variance of \[ \sigma^2 = [1 - .75]^2 \times.75 + [0 - .75]^2 \times.25 = 0.1875 \]
Continuous.
If the sample space is continuous, we can compute the theoretical mean (or expected value) as \[ \mu = \int x f(x) d x, \] where \(f(x)\) is the probability the random variable takes the particular value \(x\). Similarly, we can compute the theoretical variance as \[ \sigma^2 = \int [x - \mu]^2 f(x) d x, \] For example, consider a random variable with a continuous uniform distribution over [-1, 1]. In this case, \(f(x)=1/[1 - (-1)]=1/2\) for each \(x\) in [-1, 1] and \[ \mu = \int_{-1}^{1} \frac{x}{2} d x = \int_{-1}^{0} \frac{x}{2} d x + \int_{0}^{1} \frac{x}{2} d x = 0 \] and \[ \sigma^2 = \int_{-1}^{1} x^2 \frac{1}{2} d x = \frac{1}{2} \frac{x^3}{3}|_{-1}^{1} = \frac{1}{6}[1 - (-1)] = 2/6 =1/3 \]
8.2 Bivariate Distributions
Suppose we have two discrete variables \(X_{1}\) and \(X_{2}\).
Their joint distribution is denoted as \[\begin{eqnarray} Prob(X_{1} = x_{1}, X_{2} = x_{2}) \end{eqnarray}\] The conditional distributions are defined as \[\begin{eqnarray} Prob(X_{1} = x_{1} | X_{2} = x_{2}) = \frac{ Prob(X_{1} = x_{1}, X_{2} = x_{2})}{ Prob( X_{2} = x_{2} )}\\ Prob(X_{2} = x_{2} | X_{1} = x_{1}) = \frac{ Prob(X_{1} = x_{1}, X_{2} = x_{2})}{ Prob( X_{1} = x_{1} )} \end{eqnarray}\] The marginal distributions are then defined as \[\begin{eqnarray} Prob(X_{1} = x_{1}) = \sum_{x_{2}} Prob(X_{1} = x_{1} | X_{2} = x_{2}) Prob( X_{2} = x_{2} ) \\ Prob(X_{2} = x_{2}) = \sum_{x_{1}} Prob(X_{2} = x_{2} | X_{1} = x_{1}) Prob( X_{1} = x_{1} ), \end{eqnarray}\] which is also known as the law of total probability.
Example: Fair Coin.
For one example, Consider flipping two coins. Denoted each coin as \(i \in \{1, 2\}\), and mark whether “heads” is face up; \(X_{i}=1\) if Heads and \(=0\) if Tails. Suppose both coins are “fair”: \(Prob(X_{1}=1)= 1/2\) and \(Prob(X_{2}=1|X_{1})=1/2\), then the four potential outcomes have equal probabilities. The joint distribution is \[\begin{eqnarray} Prob(X_{1} = x_{1}, X_{2} = x_{2}) &=& Prob(X_{1} = x_{1}) Prob(X_{2} = x_{2})\\ Prob(X_{1} = 0, X_{2} = 0) &=& 1/2 \times 1/2 = 1/4 \\ Prob(X_{1} = 0, X_{2} = 1) &=& 1/4 \\ Prob(X_{1} = 1, X_{2} = 0) &=& 1/4 \\ Prob(X_{1} = 1, X_{2} = 1) &=& 1/4 . \end{eqnarray}\] The marginal distribution of the second coin is \[\begin{eqnarray} Prob(X_{2} = 0) &=& Prob(X_{2} = 0 | X_{1} = 0) Prob(X_{1}=0) + Prob(X_{2} = 0 | X_{1} = 1) Prob(X_{1}=1)\\ &=& 1/2 \times 1/2 + 1/2 \times 1/2 = 1/2\\ Prob(X_{2} = 1) &=& Prob(X_{2} = 1 | X_{1} = 0) Prob(X_{1}=0) + Prob(X_{2} = 1 | X_{1} = 1) Prob(X_{1}=1)\\ &=& 1/2 \times 1/2 + 1/2 \times 1/2 = 1/2 \end{eqnarray}\]
Code
# Create a 2x2 matrix for the joint distribution.
# Rows correspond to X1 (coin 1), and columns correspond to X2 (coin 2).
P_fair <- matrix(1/4, nrow = 2, ncol = 2)
rownames(P_fair) <- c("X1=0", "X1=1")
colnames(P_fair) <- c("X2=0", "X2=1")
P_fair
## X2=0 X2=1
## X1=0 0.25 0.25
## X1=1 0.25 0.25
# Compute the marginal distributions.
# Marginal for X1: sum across columns.
P_X1 <- rowSums(P_fair)
P_X1
## X1=0 X1=1
## 0.5 0.5
# Marginal for X2: sum across rows.
P_X2 <- colSums(P_fair)
P_X2
## X2=0 X2=1
## 0.5 0.5
# Compute the conditional probabilities Prob(X2 | X1).
cond_X2_given_X1 <- matrix(0, nrow = 2, ncol = 2)
for (j in 1:2) {
cond_X2_given_X1[, j] <- P_fair[, j] / P_X1[j]
}
rownames(cond_X2_given_X1) <- c("X2=0", "X2=1")
colnames(cond_X2_given_X1) <- c("given X1=0", "given X1=1")
cond_X2_given_X1
## given X1=0 given X1=1
## X2=0 0.5 0.5
## X2=1 0.5 0.5
Example: Unfair Coin.
Consider a second example, where the second coin is “Completely Unfair”, so that it is always the same as the first. The outcomes generated with a Completely Unfair coin are the same as if we only flipped one coin. \[\begin{eqnarray} Prob(X_{1} = x_{1}, X_{2} = x_{2}) &=& Prob(X_{1} = x_{1}) \mathbf{1}( x_{1}=x_{2} )\\ Prob(X_{1} = 0, X_{2} = 0) &=& 1/2 \\ Prob(X_{1} = 0, X_{2} = 1) &=& 0 \\ Prob(X_{1} = 1, X_{2} = 0) &=& 0 \\ Prob(X_{1} = 1, X_{2} = 1) &=& 1/2 . \end{eqnarray}\] Note that $(X_{1}=1) $ means \(X_{1}= 1\) and \(0\) if \(X_{1}\neq0\). The marginal distribution of the second coin is \[\begin{eqnarray} Prob(X_{2} = 0) &=& Prob(X_{2} = 0 | X_{1} = 0) Prob(X_{1}=0) + Prob(X_{2} = 0 | X_{1} = 1) Prob(X_{1}=1) \\ &=& 1/2 \times 1 + 0 \times 1/2 = 1/2\\ Prob(X_{2} = 1) &=& Prob(X_{2} = 1 | X_{1} =0) Prob( X_{1} = 0) + Prob(X_{2} = 1 | X_{1} = 1) Prob( X_{1} =1)\\ &=& 0\times 1/2 + 1 \times 1/2 = 1/2 \end{eqnarray}\] which is the same as in the first example! Different joint distributions can have the same marginal distributions.
Code
# Create the joint distribution matrix for the unfair coin case.
P_unfair <- matrix(c(0.5, 0, 0, 0.5), nrow = 2, ncol = 2, byrow = TRUE)
rownames(P_unfair) <- c("X1=0", "X1=1")
colnames(P_unfair) <- c("X2=0", "X2=1")
P_unfair
## X2=0 X2=1
## X1=0 0.5 0.0
## X1=1 0.0 0.5
# Compute the marginal distribution for X2 in the unfair case.
P_X2_unfair <- colSums(P_unfair)
P_X1_unfair <- rowSums(P_unfair)
# Compute the conditional probabilities Prob(X1 | X2) for the unfair coin.
cond_X2_given_X1_unfair <- matrix(NA, nrow = 2, ncol = 2)
for (j in 1:2) {
if (P_X1_unfair[j] > 0) {
cond_X2_given_X1_unfair[, j] <- P_unfair[, j] / P_X1_unfair[j]
}
}
rownames(cond_X2_given_X1_unfair) <- c("X2=0", "X2=1")
colnames(cond_X2_given_X1_unfair) <- c("given X1=0", "given X1=1")
cond_X2_given_X1_unfair
## given X1=0 given X1=1
## X2=0 1 0
## X2=1 0 1
Bayes’ Theorem.
Finally, note Bayes’ Theorem: \[\begin{eqnarray} Prob(X_{1} = x_{1} | X_{2} = x_{2}) Prob( X_{2} = x_{2}) &=& Prob(X_{1} = x_{1}, X_{2} = x_{2}) = Prob(X_{2} = x_{2} | X_{1} = x_{1}) Prob(X_{1}=x_{1})\\ Prob(X_{1} = x_{1} | X_{2} = x_{2}) &=& \frac{ Prob(X_{2} = x_{2} | X_{1} = x_{1}) Prob(X_{1}=x_{1}) }{ Prob( X_{2} = x_{2}) } \end{eqnarray}\]
Code
# Verify Bayes' theorem for the unfair coin case:
# Compute Prob(X1=1 | X2=1) using the formula:
# Prob(X1=1 | X2=1) = [Prob(X2=1 | X1=1) * Prob(X1=1)] / Prob(X2=1)
P_X1_1 <- 0.5
P_X2_1_given_X1_1 <- 1 # Since coin 2 copies coin 1.
P_X2_1 <- P_X2_unfair["X2=1"]
bayes_result <- (P_X2_1_given_X1_1 * P_X1_1) / P_X2_1
bayes_result
## X2=1
## 1
8.3 Further Reading
Many introductory econometrics textbooks have a good appendix on probability and statistics. There are many useful texts online too
- [Refresher] https://www.khanacademy.org/math/statistics-probability/probability-library/basic-theoretical-probability/a/probability-the-basics
- https://www.r-bloggers.com/2024/03/calculating-conditional-probability-in-r/
- https://www.atmos.albany.edu/facstaff/timm/ATM315spring14/R/IPSUR.pdf
- https://math.dartmouth.edu/~prob/prob/prob.pdf
- https://bookdown.org/speegled/foundations-of-statistics/
- https://bookdown.org/probability/beta/discrete-random-variables.html
- https://www.econometrics-with-r.org/2.1-random-variables-and-probability-distributions.html
- https://probability4datascience.com/ch02.html
- https://rc2e.com/probability
- https://book.stat420.org/probability-and-statistics-in-r.html
- https://statsthinking21.github.io/statsthinking21-R-site/probability-in-r-with-lucy-king.html
- https://bookdown.org/probability/statistics/
- https://bookdown.org/probability/beta/
- https://bookdown.org/a_shaker/STM1001_Topic_3/
- https://bookdown.org/fsancier/bookdown-demo/
- https://bookdown.org/kevin_davisross/probsim-book/
- https://bookdown.org/machar1991/ITER/2-pt.html
- https://www.atmos.albany.edu/facstaff/timm/ATM315spring14/R/IPSUR.pdf
- https://math.dartmouth.edu/~prob/prob/prob.pdf