We now consider a bivariate random vector\((X_{i}, Y_{i})\), which is a theoretical version of the bivariate observations \((\hat{X}_{i}, \hat{Y}_{i})\). E.g., If we are going to flip two coins, then \((X_{i}, Y_{i})\) corresponds to the unflipped coins and \((\hat{X}_{i}, \hat{Y}_{i})\) corresponds to concrete values after they are flipped.
Definitions for Discrete Data.
The joint distribution is defined as \[\begin{eqnarray}
Prob(X_{i} = x, Y_{i} = y)
\end{eqnarray}\] For example, consider a bivariate random vector with three outcomes for \(X_{i}\) and three outcomes for \(Y_{i}\). The plot below shows each outcome \((x,y)\) with deeper colors reflecting higher probability events.
Note that variables are statistically independent if \(Prob(X_{i} = x, Y_{i} = y)= Prob(X_{i} = x) Prob(Y_{i} = y)\) for all \(x, y\). Independence is sometimes assumed for mathematical simplicity, not because it generally fits data well.1
The conditional distributions are defined as \[\begin{eqnarray}
Prob(X_{i} = x | Y_{i} = y) = \frac{ Prob(X_{i} = x, Y_{i} = y)}{ Prob( Y_{i} = y )}\\
Prob(Y_{i} = y | X_{i} = x) = \frac{ Prob(X_{i} = x, Y_{i} = y)}{ Prob( X_{i} = x )}
\end{eqnarray}\] The marginal distributions are then defined as \[\begin{eqnarray}
Prob(X_{i} = x) = \sum_{y} Prob(X_{i} = x | Y_{i} = y) Prob( Y_{i} = y ) \\
Prob(Y_{i} = y) = \sum_{x} Prob(Y_{i} = y | X_{i} = x) Prob( X_{i} = x ),
\end{eqnarray}\] which is also known as the law of total probability.
Coin Flips Example.
For one example, Consider flipping two coins, where we mark whether “heads” is face up with a \(1\) and “tail” with a \(0\). E.g., the first coin has a value of \(x=1\) if it shows heads and \(x=0\) if it shows tails. This table shows both the joint distribution and also each marginal distribution.
\(x=0\)
\(x=1\)
Marginal
\(y=0\)
\(Prob(X_{i}=0,Y_{i}=0)\)
\(Prob(X_{i}=1,Y_{i}=0)\)
\(Prob(Y_{i}=0)\)
\(y=1\)
\(Prob(X_{i}=0,Y_{i}=1)\)
\(Prob(X_{i}=1,Y_{i}=1)\)
\(Prob(Y_{i}=1)\)
Marginal
\(Prob(X_{i}=0)\)
\(Prob(X_{i}=1)\)
\(1\)
Note that different joint distributions can have the same marginal distributions.
Note
Suppose both coins are “fair”: \(Prob(X_{i}=1)= 1/2\) and \(Prob(Y_{i}=1|X_{i}=x)=1/2\) for either \(x=1\) or \(x=0\), then the four potential outcomes have equal probabilities. \[\begin{eqnarray}
Prob(X_{i} = 0, Y_{i} = 0) &=& 1/2 \times 1/2 = 1/4 \\
Prob(X_{i} = 0, Y_{i} = 1) &=& 1/4 \\
Prob(X_{i} = 1, Y_{i} = 0) &=& 1/4 \\
Prob(X_{i} = 1, Y_{i} = 1) &=& 1/4 .
\end{eqnarray}\] The joint distribution is written generally as \[\begin{eqnarray}
Prob(X_{i} = x, Y_{i} = y) &=& Prob(X_{i} = x) Prob(Y_{i} = y).
\end{eqnarray}\]
The marginal distribution of the first coin is found in the exact same way \[\begin{eqnarray}
Prob(X_{i} = 0) &=& Prob(X_{i} = 0 | Y_{i} = 0) Prob(Y_{i}=0) + Prob(X_{i} = 0 | Y_{i} = 1) Prob(Y_{i}=1)\\
&=& 1/2 (1/2) + 1/2 (1/2) = 1/2\\
Prob(X_{i} = 1) &=& Prob(X_{i} = 1 | Y_{i} = 0) Prob(Y_{i}=0) + Prob(X_{i} = 1 | Y_{i} = 1) Prob(Y_{i}=1)\\
&=& 1/2 (1/2) + 1/2 (1/2) = 1/2
\end{eqnarray}\]
Altogether, we find
\(x=0\)
\(x=1\)
Marginal
\(y=0\)
\(1/4\)
\(1/4\)
\(1/2\)
\(y=1\)
\(1/4\)
\(1/4\)
\(1/2\)
Marginal
\(1/2\)
\(1/2\)
\(1\)
Code
# Create a 2x2 matrix for the joint distribution.# Rows correspond to X1 (coin 1), and columns correspond to X2 (coin 2).P_fair <-matrix(1/4, nrow =2, ncol =2)rownames(P_fair) <-c("X1=0", "X1=1")colnames(P_fair) <-c("X2=0", "X2=1")P_fair## X2=0 X2=1## X1=0 0.25 0.25## X1=1 0.25 0.25# Compute the marginal distributions.# Marginal for X1: sum across columns.P_X1 <-rowSums(P_fair)P_X1## X1=0 X1=1 ## 0.5 0.5# Marginal for X2: sum across rows.P_X2 <-colSums(P_fair)P_X2## X2=0 X2=1 ## 0.5 0.5# Compute the conditional probabilities Prob(X2 | X1).cond_X2_given_X1 <-matrix(0, nrow =2, ncol =2)for (j inc(1,2)) { cond_X2_given_X1[, j] <- P_fair[, j] / P_X1[j]}rownames(cond_X2_given_X1) <-c("X2=0", "X2=1")colnames(cond_X2_given_X1) <-c("given X1=0", "given X1=1")cond_X2_given_X1## given X1=0 given X1=1## X2=0 0.5 0.5## X2=1 0.5 0.5
Note
Now consider a second example, where the second coin is “Completely Unfair”, so that it is always the same as the first. The outcomes generated with a Completely Unfair coin are the same as if we only flipped one coin. \[\begin{eqnarray}
Prob(X_{i} = 0, Y_{i} = 0) &=& 1/2 \\
Prob(X_{i} = 0, Y_{i} = 1) &=& 0 \\
Prob(X_{i} = 1, Y_{i} = 0) &=& 0 \\
Prob(X_{i} = 1, Y_{i} = 1) &=& 1/2 .
\end{eqnarray}\] The joint distribution is written generally as \[\begin{eqnarray}
Prob(X_{i} = x, Y_{i} = y) &=& Prob(X_{i} = x) \mathbf{1}( x=y ),
\end{eqnarray}\] where \(\mathbf{1}(X_{i}=1)\) means \(X_{i}= 1\) and \(0\) if \(X_{i}\neq0\). The marginal distribution of the second coin is \[\begin{eqnarray}
Prob(Y_{i} = 0)
&=& Prob(Y_{i} = 0 | X_{i} = 0) Prob(X_{i}=0) + Prob(Y_{i} = 0 | X_{i} = 1) Prob(X_{i} = 1)\\
&=& 1 (1/2) + 0(1/2) = 1/2 .\\
Prob(Y_{i} = 1)
&=& Prob(Y_{i} = 1 | X_{i} =0) Prob( X_{i} = 0) + Prob(Y_{i} = 1 | X_{i} = 1) Prob( X_{i} = 1)\\
&=& 0 (1/2) + 1 (1/2) = 1/2 .
\end{eqnarray}\] which is the same marginal as in the first example!
The marginal distribution of the first coin is found in the exact same way (show this yourself).
Alltogether, we find
\(x=0\)
\(x=1\)
Marginal
\(y=0\)
\(1/2\)
\(0\)
\(1/2\)
\(y=1\)
\(0\)
\(1/2\)
\(1/2\)
Marginal
\(1/2\)
\(1/2\)
\(1\)
Code
# Create the joint distribution matrix for the unfair coin case.P_unfair <-matrix(c(0.5, 0, 0, 0.5), nrow =2, ncol =2, byrow =TRUE)rownames(P_unfair) <-c("X1=0", "X1=1")colnames(P_unfair) <-c("X2=0", "X2=1")P_unfair## X2=0 X2=1## X1=0 0.5 0.0## X1=1 0.0 0.5# Compute the marginal distribution for X2 in the unfair case.P_X2_unfair <-colSums(P_unfair)P_X1_unfair <-rowSums(P_unfair)# Compute the conditional probabilities Prob(X1 | X2) for the unfair coin.cond_X2_given_X1_unfair <-matrix(NA, nrow =2, ncol =2)for (j inc(1,2)) {if (P_X1_unfair[j] >0) { cond_X2_given_X1_unfair[, j] <- P_unfair[, j] / P_X1_unfair[j] }}rownames(cond_X2_given_X1_unfair) <-c("X2=0", "X2=1")colnames(cond_X2_given_X1_unfair) <-c("given X1=0", "given X1=1")cond_X2_given_X1_unfair## given X1=0 given X1=1## X2=0 1 0## X2=1 0 1
Definitions for Continuous Data.
The joint distribution is defined as \[\begin{eqnarray}
F(x, y) &=& Prob(X_{i} \leq x, Y_{i} \leq y)
\end{eqnarray}\] The marginal distributions are then defined as \[\begin{eqnarray}
F_{X}(x) &=& F(x, \infty)\\
F_{Y}(y) &=& F(\infty, y).
\end{eqnarray}\] which is also known as the law of total probability. Variables are statistically independent if \(F(x, y) = F_{X}(x)F_{Y}(y)\) for all \(x, y\).
For example, suppose \((X_{i},Y_{i})\) is bivariate normal with means \((\mu_{X}, \mu_{Y})\), variances \((\sigma_{X}, \sigma_{Y})\) and covariance \(\rho\).
# See that independent data are a special casen <-2e4## 2 Indepenant RVXYiid <-cbind( rnorm(n), rnorm(n))## As a single Joint DrawXYjoint <- mvtnorm::rmvnorm(n, c(0,0))## Plotpar(mfrow=c(1,2))plot(XYiid, xlab=col=grey(0,.05), pch=16, xlim=c(-5,5), ylim=c(-5,5))plot(XYjoint,col=grey(0,.05), pch=16, xlim=c(-5,5), ylim=c(-5,5))# Compare densities#d1 <- dnorm(XYiid[,1],0)*dnorm(XYiid[,2],0)#d2 <- mvtnorm::dmvnorm(XYiid, c(0,0))#head(cbind(d1,d2))
# Verify Bayes' theorem for the unfair coin case:# Compute Prob(X1=1 | X2=1) using the formula:# Prob(X1=1 | X2=1) = [Prob(X2=1 | X1=1) * Prob(X1=1)] / Prob(X2=1)P_X1_1 <-0.5P_X2_1_given_X1_1 <-1# Since coin 2 copies coin 1.P_X2_1 <- P_X2_unfair["X2=1"]bayes_result <- (P_X2_1_given_X1_1 * P_X1_1) / P_X2_1bayes_result## X2=1 ## 1
15.2 Theoretical Statistics
We will now dig a little deeper theoretically into the statistics we compute. When we know how the data are generated theoretically, we can often compute the theoretical value of the most basic and often-used bivariate statistic: the Pearson correlation. To see this, we focus on two discrete random variables, first showing their covariance, \(\mathbb{C}[X_{i}, Y_{i}]\), and then their correlation \(\mathbb{R}[X_{i}, Y_{i}]\). Referring to and \(\mu_{X}=\mathbb{E}[X_{i}]\) and \(\mu_{Y}=\mathbb{E}[Y_{i}]\), we have \[\begin{eqnarray}
\mathbb{C}[X_{i}, Y_{i}]
&=& \mathbb{E}[(X_{i} – \mu_{X})(Y_{i} – \mu_{Y}])]
= \sum_{x}\sum_{y} (x – \mu_{X})(y – \mu_{Y}) Prob(X_{i} = x, Y_{i} = y)
\\
\mathbb{R}[X_{i}, Y_{i}] &=& \frac{\mathbb{C}[X_{i}, Y_{i}] }{ \sqrt{\mathbb{V}[X_{i}]} \sqrt{\mathbb{V}[Y_{i}]} }
\end{eqnarray}\]
For example, suppose we have discrete data with the following outcomes and probabilities. Note that cells reflect the probabilities of the outcomes depicted on the row and column labels, e.g. \(Prob(X_{i}=1, Y_{i}=0)=0.1\).
\(x=0\)
\(x=1\)
\(x=2\)
\(y=0\)
\(0.0\)
\(0.1\)
\(0.0\)
\(y=10\)
\(0.1\)
\(0.3\)
\(0.1\)
\(y=20\)
\(0.1\)
\(0.1\)
\(0.2\)
After verifying that the probabilities sum to \(1\), we then compute the marginal distributions \[\begin{eqnarray}
Prob(X_{i}=0)=0.2,\quad Prob(X_{i}=1)=0.5,\quad Prob(X_{i}=2) = 0.3 \\
Prob(Y_{i}=0)=0.1,\quad Prob(Y_{i}=10)=0.5,\quad Prob(Y_{i}=20) = 0.4
\end{eqnarray}\] which allows us to compute the means: \[\begin{eqnarray}
\mathbb{E}[X_{i}] &=& 0(0.2)+1(0.5)+2(0.3) = 1.1 \\
\mathbb{E}[Y_{i}] &=& 0(0.1)+10(0.5)+20(0.4) = 13
\end{eqnarray}\] We can then compute the cell-by-cell contributions: \(Prob(X_{i} = x, Y_{i} = y) (x-\mathbb{E}[X_{i}])(y-\mathbb{E}[Y_{i}])\), which lead plug in to the covariance formula; \[\begin{eqnarray}
\begin{array}{l l r r r r r}
\hline
x & y & Prob(X_{i}=x, Y_{i}=y) & x-\mathbb{E}[X_{i}] & y-\mathbb{E}[Y_{i}] & (x-\mathbb{E}[X_{i}])(y-\mathbb{E}[Y_{i}]) & \text{Contribution}\\
\hline
0 & 0 & 0.0 & -1.1 & -13 & 14.3 & 0\\
0 & 10 & 0.1 & -1.1 & -3 & 3.3 & 0.330\\
0 & 20 & 0.1 & -1.1 & 7 & -7.7 & -0.770\\
1 & 0 & 0.1 & -0.1 & -13 & 1.3 & 0.130\\
1 & 10 & 0.3 & -0.1 & -3 & 0.3 & 0.090\\
1 & 20 & 0.1 & -0.1 & 7 & -0.7 & -0.070\\
2 & 0 & 0.0 & 0.9 & -13 & -11.7 & 0\\
2 & 10 & 0.1 & 0.9 & -3 & -2.7 & -0.270\\
2 & 20 & 0.2 & 0.9 & 7 & 6.3 & 1.260\\
\hline
\end{array}
\end{eqnarray}\]\[\begin{eqnarray}
\mathbb{C}[X_{i},Y_{i}] &=& \sum_{x} \sum_{y} \left(x-\mathbb{E}[X_{i}]\right)\left(y-\mathbb{E}[Y_{i}]\right) Prob\left(X_{i} = x, Y_{i} = y\right) \\
&=& 0 + 0.330 -0.770 + 0.130 + 0.090 -0.070 +0 -0.270 + 1.260
= 0.7
\end{eqnarray}\]
To compute the correlation value, we first need the standard deviations \[\begin{eqnarray}
\mathbb{V}[X_{i}] &=& \sum_{x} (x-\mathbb{E}[X_{i}])^2 Prob(X_{i} = x) \\
&=& (0-1.1)^2(0.2)+(1-1.1)^2(0.5)+(2-1.1)^2(0.3)=0.49 \\
\mathbb{V}[Y_{i}] &=& \sum_{y} (y-\mathbb{E}[Y_{i}])^2 Prob(Y_{i} = y) \\
&=& (0-13)^2(0.1)+(10-13)^2(0.5)+(20-13)^2(0.4)=41 \\
\end{eqnarray}\] Then we can find the correlation as \[\begin{eqnarray}
\frac{\mathbb{C}[X_{i},Y_{i}]}{ \sqrt{\mathbb{V}[X_{i}]} \sqrt{\mathbb{V}[Y_{i}]} }
&=& \frac{0.7}{\sqrt{0.49} \sqrt{41}} \approx 0.156,
\end{eqnarray}\] which suggests a weak positive association between the variables.
Using the Computer.
Note that you can do all of the above calculations using the computer instead of by hand.
Compute the correlation for bivariate data with these probabilities, using both math (first) and the computer (second)
\(x=-10\)
\(x=10\)
\(y=0\)
\(0.05\)
\(0.20\)
\(y=1\)
\(0.05\)
\(0.20\)
\(y=2\)
\(0.05\)
\(0.20\)
\(y=3\)
\(0.05\)
\(0.20\)
Tip
Also compute the correlation for bivariate data with these probabilities
\(x=-10\)
\(x=10\)
\(y=0\)
\(0.05\)
\(0.05\)
\(y=1\)
\(0.10\)
\(0.10\)
\(y=2\)
\(0.15\)
\(0.15\)
\(y=3\)
\(0.20\)
\(0.20\)
Also compute the correlation for bivariate data with these probabilities
\(x=-10\)
\(x=10\)
\(y=0\)
\(0.05\)
\(0.15\)
\(y=1\)
\(0.05\)
\(0.15\)
\(y=2\)
\(0.10\)
\(0.20\)
\(y=3\)
\(0.10\)
\(0.20\)
Explain intuitively when the correlation equals \(0\) and when it does not.
15.3 Comparing two groups
Under the assumption that both populations are normally distributed, we can analytically derive the sampling distribution for the difference-in-means between two groups. In particular, the \(t\)-statistic \[\begin{eqnarray}
\hat{t} = \frac{
\hat{M}_{Y1} - \hat{M}_{Y2}
}{
\sqrt{\hat{S}_{Y1}+\hat{S}_{Y2}}/\sqrt{n}
}
\end{eqnarray}\] follows Student’s t-distribution. Welch’s \(t\)-statistic is an adjustment for two normally distributed populations with potentially unequal variances or sample sizes.
t.test(Y1, Y2, var.equal=F)## ## Welch Two Sample t-test## ## data: Y1 and Y2## t = -4.1141, df = 165.09, p-value = 6.121e-05## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## -1.3576198 -0.4771004## sample estimates:## mean of x mean of y ## -0.01724888 0.90011121
15.4 Further Reading
Many introductory econometrics textbooks have a good appendix on probability and statistics. There are many useful statistical texts online too
See the Further reading about Probability Theory in the Statistics chapter.