10 Probability Theory

10.1 Mean and Variance

Discrete. If the sample space is discrete, we can compute the theoretical mean (or expected value) as \[ \mu = \sum_{i} x_{i} P(X=x_{i}), \] where \(P(X=x_{i})\) is the probability the random variable takes the particular value \(x_{i}\). Similarly, we can compute the theoretical variance as \[ \sigma^2 = \sum_{i} [x_{i} - \mu]^2 P(X=x_{i}), \]

For example, consider an unfair coin with a \(.75\) probability of heads (\(x_{i}=1\)) and a \(.25\) probability of tails (\(x_{i}=0\)) has a theoretical mean of \[ \mu = 1\times.75 + 0 \times .25 = .75 \] and a theoretical variance of \[ \sigma^2 = [1 - .75]^2 \times.75 + [0 - .75]^2 \times.25 = 0.1875 \]

x <- rbinom(10000, size=1, prob=.75)
round( mean(x), 4)
## [1] 0.747
round( var(x), 4)
## [1] 0.189

Continuous. If the sample space is continuous, we can compute the theoretical mean (or expected value) as \[ \mu = \int x f(x) d x, \] where \(f(x)\) is the probability the random variable takes the particular value \(x\). Similarly, we can compute the theoretical variance as \[ \sigma^2 = \int [x - \mu]^2 f(x) d x, \] For example, consider a random variable with a continuous uniform distribution over [-1, 1]. In this case, \(f(x)=1/[1 - (-1)]=1/2\) for each \(x\) in [-1, 1] and \[ \mu = \int_{-1}^{1} \frac{x}{2} d x = \int_{-1}^{0} \frac{x}{2} d x + \int_{0}^{1} \frac{x}{2} d x = 0 \] and \[ \sigma^2 = \int_{-1}^{1} x^2 \frac{1}{2} d x = \frac{1}{2} \frac{x^3}{3}|_{-1}^{1} = \frac{1}{6}[1 - (-1)] = 2/6 =1/3 \]

x <- runif(10000, -1,1)
round( mean(x), 4)
## [1] -0.0026
round( var(x), 4)
## [1] 0.3287

10.2 Bivariate Distributions

Suppose we have two discrete variables \(X_{1}\) and \(X_{2}\). Their joint distribution is denoted as \[\begin{eqnarray} P(X_{1} = x_{1}, X_{2} = x_{2}) \end{eqnarray}\] The conditional distributions are defined as \[\begin{eqnarray} P(X_{1} = x_{1} | X_{2} = x_{2}) = \frac{ P(X_{1} = x_{1}, X_{2} = x_{2})}{ P( X_{2} = x_{2} )}\\ P(X_{2} = x_{2} | X_{1} = x_{1}) = \frac{ P(X_{1} = x_{1}, X_{2} = x_{2})}{ P( X_{1} = x_{1} )} \end{eqnarray}\] The marginal distributions are then defined as \[\begin{eqnarray} P(X_{1} = x_{1}) = \sum_{x_{2}} P(X_{1} = x_{1} | X_{2} = x_{2}) P( X_{2} = x_{2} ) \\ P(X_{2} = x_{2}) = \sum_{x_{1}} P(X_{2} = x_{2} | X_{1} = x_{1}) P( X_{1} = x_{1} ), \end{eqnarray}\] which is also known as the law of total probability.

For one example, Consider flipping two coins. Denoted each coin as \(i \in \{1, 2\}\), and mark whether “heads” is face up; \(X_{i}=1\) if Heads and \(=0\) if Tails. Suppose both coins are “fair”: \(P(X_{1}=1)= 1/2\) and \(P(X_{2}=1|X_{1})=1/2\), then the four potential outcomes have equal probabilities. The joint distribution is \[\begin{eqnarray} P(X_{1} = x_{1}, X_{2} = x_{2}) &=& P(X_{1} = x_{1}) P(X_{2} = x_{2})\\ P(X_{1} = 0, X_{2} = 0) &=& 1/2 \times 1/2 = 1/4 \\ P(X_{1} = 0, X_{2} = 1) &=& 1/4 \\ P(X_{1} = 1, X_{2} = 0) &=& 1/4 \\ P(X_{1} = 1, X_{2} = 1) &=& 1/4 . \end{eqnarray}\] The marginal distribution of the second coin is \[\begin{eqnarray} P(X_{2} = 0) &=& P(X_{2} = 0 | X_{1} = 0) P(X_{1}=0) + P(X_{2} = 0 | X_{1} = 1) P(X_{1}=1)\\ &=& 1/2 \times 1/2 + 1/2 \times 1/2 = 1/2\\ P(X_{2} = 1) &=& P(X_{2} = 1 | X_{1} = 0) P(X_{1}=0) + P(X_{2} = 1 | X_{1} = 1) P(X_{1}=1)\\ &=& 1/2 \times 1/2 + 1/2 \times 1/2 = 1/2 \end{eqnarray}\]

# Create a 2x2 matrix for the joint distribution.
# Rows correspond to X1 (coin 1), and columns correspond to X2 (coin 2).
P_fair <- matrix(1/4, nrow = 2, ncol = 2)
rownames(P_fair) <- c("X1=0", "X1=1")
colnames(P_fair) <- c("X2=0", "X2=1")
P_fair
##      X2=0 X2=1
## X1=0 0.25 0.25
## X1=1 0.25 0.25
# Compute the marginal distributions.
# Marginal for X1: sum across columns.
P_X1 <- rowSums(P_fair)
P_X1
## X1=0 X1=1 
##  0.5  0.5
# Marginal for X2: sum across rows.
P_X2 <- colSums(P_fair)
P_X2
## X2=0 X2=1 
##  0.5  0.5
# Compute the conditional probabilities P(X2 | X1).
cond_X2_given_X1 <- matrix(0, nrow = 2, ncol = 2)
for (j in 1:2) {
  cond_X2_given_X1[, j] <- P_fair[, j] / P_X1[j]
}
rownames(cond_X2_given_X1) <- c("X2=0", "X2=1")
colnames(cond_X2_given_X1) <- c("given X1=0", "given X1=1")
cond_X2_given_X1
##      given X1=0 given X1=1
## X2=0        0.5        0.5
## X2=1        0.5        0.5

Consider a second example, where the second coin is “Completely Unfair”, so that it is always the same as the first. The outcomes generated with a Completely Unfair coin are the same as if we only flipped one coin. \[\begin{eqnarray} P(X_{1} = x_{1}, X_{2} = x_{2}) &=& P(X_{1} = x_{1}) \mathbf{1}( x_{1}=x_{2} )\\ P(X_{1} = 0, X_{2} = 0) &=& 1/2 \\ P(X_{1} = 0, X_{2} = 1) &=& 0 \\ P(X_{1} = 1, X_{2} = 0) &=& 0 \\ P(X_{1} = 1, X_{2} = 1) &=& 1/2 . \end{eqnarray}\] Note that $(X_{1}=1) $ means \(X_{1}= 1\) and \(0\) if \(X_{1}\neq0\). The marginal distribution of the second coin is \[\begin{eqnarray} P(X_{2} = 0) &=& P(X_{2} = 0 | X_{1} = 0) P(X_{1}=0) + P(X_{2} = 0 | X_{1} = 1) P(X_{1}=1) \\ &=& 1/2 \times 1 + 0 \times 1/2 = 1/2\\ P(X_{2} = 1) &=& P(X_{2} = 1 | X_{1} =0) P( X_{1} = 0) + P(X_{2} = 1 | X_{1} = 1) P( X_{1} =1)\\ &=& 0\times 1/2 + 1 \times 1/2 = 1/2 \end{eqnarray}\] which is the same as in the first example! Different joint distributions can have the same marginal distributions.

# Create the joint distribution matrix for the unfair coin case.
P_unfair <- matrix(c(0.5, 0, 0, 0.5), nrow = 2, ncol = 2, byrow = TRUE)
rownames(P_unfair) <- c("X1=0", "X1=1")
colnames(P_unfair) <- c("X2=0", "X2=1")
P_unfair
##      X2=0 X2=1
## X1=0  0.5  0.0
## X1=1  0.0  0.5
# Compute the marginal distribution for X2 in the unfair case.
P_X2_unfair <- colSums(P_unfair)
P_X1_unfair <- rowSums(P_unfair)

# Compute the conditional probabilities P(X1 | X2) for the unfair coin.
cond_X2_given_X1_unfair <- matrix(NA, nrow = 2, ncol = 2)
for (j in 1:2) {
  if (P_X1_unfair[j] > 0) {
    cond_X2_given_X1_unfair[, j] <- P_unfair[, j] / P_X1_unfair[j]
  }
}
rownames(cond_X2_given_X1_unfair) <- c("X2=0", "X2=1")
colnames(cond_X2_given_X1_unfair) <- c("given X1=0", "given X1=1")
cond_X2_given_X1_unfair
##      given X1=0 given X1=1
## X2=0          1          0
## X2=1          0          1

Finally, note Bayes’ Theorem: \[\begin{eqnarray} P(X_{1} = x_{1} | X_{2} = x_{2}) P( X_{2} = x_{2}) &=& P(X_{1} = x_{1}, X_{2} = x_{2}) = P(X_{2} = x_{2} | X_{1} = x_{1}) P(X_{1}=x_{1})\\ P(X_{1} = x_{1} | X_{2} = x_{2}) &=& \frac{ P(X_{2} = x_{2} | X_{1} = x_{1}) P(X_{1}=x_{1}) }{ P( X_{2} = x_{2}) } \end{eqnarray}\]

# Verify Bayes' theorem for the unfair coin case:
# Compute P(X1=1 | X2=1) using the formula:
#   P(X1=1 | X2=1) = [P(X2=1 | X1=1) * P(X1=1)] / P(X2=1)

P_X1_1 <- 0.5
P_X2_1_given_X1_1 <- 1  # Since coin 2 copies coin 1.
P_X2_1 <- P_X2_unfair["X2=1"]

bayes_result <- (P_X2_1_given_X1_1 * P_X1_1) / P_X2_1
bayes_result
## X2=1 
##    1