9 Advanced Probability

9.1 Discrete Factorial Distributions

Binomial.

The sum of \(n\) Bernoulli’s (number of successes)

\[\begin{eqnarray} X_{i} \in \{0,1,\ldots,n\} \\ Prob(X_{i}=x) &=& \binom{n}{x}p^{x}(1-p)^{n-x} \end{eqnarray}\]

The simplest case: how many heads will I get when I flip a coin twice? We can first see that there are four potential outcomes: \(\{ (1,1), (1,0), (0,1), (0,0) \}\), each of which are equally likely (assuming the coin is fair). The corresponding sums are \(\{2,1,1,0 \}\), which leads to probabilities \(Prob(X_{i}=0)=1/4\), \(Prob(X_{i}=1)=2/4\), and \(Prob(X_{i}=2)=1/4\).

Tip

When I flip a coin three times, the potential outcomes are \[\begin{eqnarray} \{ (1,1,1), (1,1,0), (1,0,1), (1,0,0),\\ (0,1,1), (0,1,0), (0,0,1), (0,0,0) \} \end{eqnarray}\] Assuming the coin is fair, compute the probabilities for each possible sum.

In general, we use the Binomial_distribution.

Code

## Minimal example
n <- 15
p <- 0.3

# PMF plot
x <- seq(0, n)
f_x <- dbinom(x, n, p)
plot(x, f_x, type = "h", col = "blue",
    main='',  xlab = "x", ylab = "Prob(X = x)")
title(bquote(paste('Binom(',.(n),', ',.(p), ')' ) ))
points(x, f_x, pch = 16, col = "blue")

Code


## Simulation
X <- rbinom(1000, n, p)
plot(ecdf(X))

Tip

Suppose that employees at a company are \(70%\) female and \(30%\) male. If we select a random sample of eight employees, what is the probability that \(2\) in the sample are female? That \(\geq 2\) are female?

Try conducting a simulation first,

Code

n <- 8
p <- 0.3
X <- rbinom(1000, n, p)

Tip

For \(n\) coin flips, show that \(\mathbb{E}[X_{i}]=np\) and \(\mathbb{V}[X_{i}]=np(1-p)\). Build on the case \(n=2\) above, then consider case \(n=3\), and then the general case of any \(n\).

Code

# Simulate:  Compare empirical vs theoretical
x <- c(0,1)
x_probs <- c(1/4, 3/4)
X <- sample(x, 1000, prob=x_probs, replace=T)
#c(emp_mean = mean(X), th_mean = n*p)
#c(emp_var  = var(X),  th_var  = n*p*(1-p))

The de Moivre–Laplace Theorem says that as \(n\) grows large, with \(p \in (0,1)\) staying fixed, the Binomial distribution is approximately Normal with mean \(np\) and variance \(np(1-p)\)

Tip

The unemployment rate is \(10%\). Suppose that \(100\) employable people are selected randomly. What is the probability that this sample contains between \(9\) and \(12\) unemployed people. Use the normal approximation to binomial probabilities.

Parameters are \[\begin{eqnarray} \mu &\approx& n p = 10 \\ \sigma^2 &=& n p (1−p) = 100 (0.1)(0.9)=9 \\ \sigma &=& \sqrt{9}=3. \end{eqnarray}\]

The Poisson Limit Theorem says that as \(n\) grows large, with \(p\) growing smaller at the same time, and \(n p\) converging to a fixed number \(\lambda\), the Binomial distribution is approximately Poisson with parameter \(\lambda\).

Poisson.

The number of events in a fixed interval

Discrete, support \(\{0,1,2,\ldots\}\)
Probability Mass Function: \(Prob(X_{i}=x)=e^{-\lambda}\lambda^x/x!\)
See https://en.wikipedia.org/wiki/Poisson_distribution
A common use case: How many cars will show up in the lot tomorrow?

Code

## Minimal example
lambda <- 3.5

# PMF plot
x <- seq(0, 15)
f_x <- dpois(x, lambda)
plot(x, f_x, type="h", col="blue",
     xlab = "x", ylab = "Prob(X = x)")
points(x, f_x, pch = 16, col = "blue")
title(bquote(paste('Pois(',.(lambda), ')')))

Code



# Simulate: Compare empirical vs theoretical
#X <- rpois(1e4, lambda)
#c(emp_mean = mean(X), th_mean = lambda)
#c(emp_var  = var(X),  th_var  = lambda)

Tip

Show that \(\mathbb{E}[X_{i}] = \mathbb{V}[X_{i}]= \lambda\).

9.2 Continuous Factorial Distributions

Irwin–Hall.

The sum of \(n\) i.i.d. \(\text{Uniform}(0,1)\).

Continuous, support \([0,n]\)
Probability Density Function: \(f(x) = \dfrac{1}{(n-1)!} \displaystyle\sum_{k=0}^{\lfloor x \rfloor} (-1)^k \binom{n}{k} (x - k)^{n-1}\) for \(x \in [0,n]\), and \(0\) otherwise
See https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution
A common use case: representing the sum of many small independent financial shocks

Code

## Minimal example (base R)
# Irwin–Hall PDF function
dirwinhall <- function(x, n) {
    f_x <- vector(length=length(x))
    for(i in seq(x)) {
        xx <- x[i]
        if(xx < 0 | xx > n){
            f_x[i] <- 0 
        } else {
            k <- seq(0, floor(xx))
            f_k <- sum((-1)^k*choose(n, k)*(xx-k)^(n-1))/factorial(n-1)
            f_x[i] <- f_k
        }
    }
    return(f_x)
}

# Parameters
n <- 2
x <- seq(0, n, length.out = 500)

# Compute and plot PDF
f_x <- dirwinhall(x, n)
plot(x, f_x, type="l", col="blue",
    main='', xlab = "x", ylab = "f(x)")
title(bquote(paste('IrwinHall(',.(n), ')')))

See also the Bates distribution, which is for the mean of \(n\) i.i.d. \(\text{Uniform}(0,1)\) random variables. It essentially rescales the Irwin-Hall distribution to range between \(0\) and \(1\).

Beta.

The sample space is any number on the unit interval, \(X_{i} \in [0,1]\), but with non-uniform probabilities.

Code

X4 <- rbeta(2000,2,2) ## two shape parameters
hist(X4, breaks=20, border=NA, main=NA, freq=F)

#See the underlying probabilities
#f_25 <- dbeta(.25, 2, 2)

x <- seq(0,1,by=.01)
fx <- dbeta(x, 2, 2)
lines(x, fx, col='blue')

The Beta distribution is mathematically complicated to write, and so we omit it. However, we can find the probability graphically using either the probability density function or cumulative distribution function.

Tip

Suppose \(X_{i}\) is a random variable with a beta distribution. Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.

Code

plot( ecdf(X4), main=NA) # Empirical

x <- seq(0,1,by=.01) # Theoretical
Fx <- pbeta(x, 2, 2)
lines(x, Fx, col='blue')

# Middle Interval Example 
F2 <- pbeta(0.2, 2, 2)
F8 <- pbeta(0.8, 2, 2)
F_2_8 <- F8 - F2
F_2_8
## [1] 0.792

# Visualize
title('Middle between 0.2 and 0.8')
segments( 0.2, F2, -1, F2, col='red')
segments( 0.8, F8, -1, F8, col='red')

This distribution is often used, as the probability density function has two parameters that allow it to take many different shapes.

Tip

For each example below, intuitively depict \(Prob(X_{i} \leq 0.5)\) using the PDF. Repeat the exercise using a CDF instead of a PDF to calculate a numerical value.

Code

op <- par(no.readonly = TRUE); on.exit(par(op), add = TRUE)
x <- seq(0,1,by=.01)
pars <- expand.grid( c(.5,1,2), c(.5,1,2) )
par(mfrow=c(3,3))
apply(pars, 1, function(p){
    fx <- dbeta( x,p[1], p[2])
    plot(x, fx, type='l', xlim=c(0,1), ylim=c(0,4), lwd=2, col='blue')
    #hist(rbeta(2000, p[1], p[2]), breaks=50, border=NA, main=NA, freq=F)
})
title('Beta densities', outer=T, line=-1)

9.3 Set Theory

Let the sample space contain events \(A\) and \(B\), with their probability denoted as \(Prob(A)\) and \(Prob(B)\). Then

the union: \(A\cup B\), refers to either event occurring
the intersection: \(A\cap B\), refers to both events occurring
the events \(A\) and \(B\) are mutually exclusive if and only if \(A\cap B\) is empty
the inclusion–exclusion rule is: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)\)
the events \(A\) and \(B\) are mutually independent if and only if \(Prob(A\cap B)=Prob(A) Prob(B)\)
the conditional probability is \(Prob(A \mid B) = Prob(A \cap B)/ P(B)\)

For example, consider a six sided die where events are the number \(>2\) or the number is even.

The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\)
To check mutually exclusive: notice \(A\cap B=\{4,6\}\) which is not empty. So event \(A\) and event \(B\) are not mutually exclusive

Further assuming the die are fair, each side has an equal \(1/6\) probability. This lets us compute

\(Prob(A)=4/6=2/3\) and \(Prob(B)=3/6=1/2\)
\(A\cup B=\{2,3,4,5,6\}\), which implies \(Prob(A\cup B)=5/6\)
\(A\cap B=\{4,6\}\), which implies \(Prob(A\cap B)=2/6=1/3\)
To check independence: notice \(Prob(A\cap B)=1/3 = (2/3) (1/2) = Prob(A)Prob(B)\). So event \(A\) and event \(B\) are mutually independent
To check inclusion–exclusion: notice \(Prob(A) + Prob(B) - Prob(A\cap B) = 2/3 + 1/2 - 1/3 = 4/6 + 3/6 - 2/6 = 5/6 = Prob(A\cup B)\)
Finally, the conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows even number}) = Prob(A \mid B) = \frac{Prob(A \cap B)}{P(B)} = \frac{1/3}{1/2} = 2/3\)

Code

# Simulation verification
set.seed(123)
x <- seq(1,6)
rolls <- sample(x, 1e6, replace = TRUE)
A <- rolls > 2
B <- rolls %% 2 == 0
P_AcondB <- mean(A & B) / mean(B)
round(P_AcondB, 3)
## [1] 0.667

Tip

Consider a six sided die with sample space \(\{1,2,3,4,5,6\}\) where each outcome is each equally likely. What is \(Prob(\text{die is odd or }<5)\)? Denote the odd events as \(A=\{1,3,5\}\) and the less than five events as \(B=\{1,2,3,4\}\). Then

\(A\cap B=\{1,3\}\), which implies \(Prob(A\cap B)=2/6=1/3\).
\(Prob(A)=3/6=1/2\) and \(Prob(B)=4/6=2/3\).
By inclusion–exclusion: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)=1/2+2/3-1/3=5/6\).

Now find \(Prob(\text{die is even and }<5)\). Verify your answer with a computer simulation.

Law of Total Probability.

This law states that \(Prob(A)=Prob(A \mid B) P(B) + Prob(A \mid \lnot B) P(\lnot B)\), where \(\lnot B\) is the event “not \(B\)”.

For example, consider a six sided die where events are the number \(>2\) or the number is even. - The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\) and \(\lnot B = \{1,3,5\}\) From before, we know that \(Prob(A)=2/3\), \(Prob(B)=1/2\), and \(Prob(A \mid B) = 2/3\). Then see that \(Prob(A \mid B) Prob(B) = \frac{2}{3} \frac{1}{2} = 2/6\)

We can also directly compute \(Prob(\lnot B)=1/2\), or infer it from \(Prob(\lnot B)=1-Prob(B)\). The conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows odd number}) = Prob(A \mid \lnot B) = \frac{Prob(A \cap \lnot B)}{P(\lnot B)} = 2/3\).

Then we can see that \(Prob(A \mid B) Prob(B) + Prob(A \mid \lnot B) Prob(\lnot B) = 2/6 + 2/6 = 2/3 = Prob(A)\).