9  Advanced Probability

9.1 Drawing Samples

To generate a random variable from known distributions, you can use some type of physical machine. E.g., you can roll a fair die to generate Discrete Uniform data or you can roll weighted die to generate Categorical data.

There are also several ways to computationally generate random variables from a probability distribution. Perhaps the most common one is “inverse sampling”. To generate a random variable using inverse sampling, first sample \(p\) from a uniform distribution and then find the associated quantile quantile function \(\hat{F}^{-1}(p)\).1

Using Data.

You can generate a random variable from a known empirical distribution. Inverse sampling randomly selects observations from the dataset with equal probabilities. To implement this, we

  • order the data and associate each observation with an ECDF value
  • draw \(p \in [0,1]\) as a uniform random variable
  • find the associated quantile via the ECDF

Here is an example of generating random murder rates for US states.

Code
# Empirical Distribution
X <- USArrests[,'Murder']
FX_hat <- ecdf(X)
plot(FX_hat, lwd=2, xlim=c(0,20),
    pch=16, col=grey(0,.5), main='')

Code

# Generating random variables via inverse ECDF
p <- runif(3000) ## Multiple Draws
QX_hat <- quantile(FX_hat, p, type=1)
QX_hat[c(1,2,3)]
## 47.39953815%  4.08322967% 65.45706145% 
##          6.8          2.1          9.0

## Can also do directly from the data
QX_hat <- quantile(X, p, type=1)
QX_hat[c(1,2,3)]
## 47.39953815%  4.08322967% 65.45706145% 
##          6.8          2.1          9.0

Using Math.

If you know the distribution function that generates the data, then you can derive the quantile function and do inverse sampling. That is how computers generate random data from a distribution.

Code
# 4 random data points from 3 different distributions
qunif(4)
## [1] NaN
qexp(4)
## [1] NaN
qnorm(4)
## [1] NaN

Here is an in-depth example of the Dagum distribution. The distribution function is \(F(x)=(1+(x/b)^{-a})^{-c}\). For a given probability \(p\), we can then solve for the quantile as \(F^{-1}(p)=\frac{ b p^{\frac{1}{ac}} }{(1-p^{1/c})^{1/a}}\). Afterwhich, we sample \(p\) from a uniform distribution and then find the associated quantile.

Code
# Theoretical Quantile Function (from VGAM::qdagum)
qdagum <- function(p, scale.b=1, shape1.a, shape2.c) {
  # Quantile function (theoretically derived from the CDF)
  ans <- scale.b * (expm1(-log(p) / shape2.c))^(-1 / shape1.a)
  # Special known cases
  ans[p == 0] <- 0
  ans[p == 1] <- Inf
  # Safety Checks
  ans[p < 0] <- NaN
  ans[p > 1] <- NaN
  if(scale.b <= 0 | shape1.a <= 0 | shape2.c <= 0){ ans <- ans*NaN }
  # Return
  return(ans)
}

# Generate Random Variables (VGAM::rdagum)
rdagum <-function(n, scale.b=1, shape1.a, shape2.c){
    p <- runif(n) # generate random probabilities
    x <- qdagum(p, scale.b=scale.b, shape1.a=shape1.a, shape2.c=shape2.c) #find the inverses
    return(x)
}

# Example
set.seed(123)
X <- rdagum(3000,1,3,1)
X[c(1,2,3)]
## [1] 0.7390476 1.5499868 0.8845006

9.2 Factorial Distributions

Binomial.

The sum of \(n\) Bernoulli trials (number of successes)

Code
## Minimal example
n <- 15
p <- 0.3

# PMF plot
x <- seq(0, n)
f_x <- dbinom(x, n, p)
plot(x, f_x, type = "h", col = "blue",
    main='',  xlab = "x", ylab = "Prob(X = x)")
title(bquote(paste('Binom(',.(n),', ',.(p), ')' ) ))
points(x, f_x, pch = 16, col = "blue")

Code

# Simulate:  Compare empirical vs theoretical
#X <- rbinom(1e4, size = n, prob = p)
#c(emp_mean = mean(X), th_mean = n*p)
#c(emp_var  = var(X),  th_var  = n*p*(1-p))

Suppose that employees at a company are \(70%\) female and \(30%\) male. If we select a random sample of eight employees, what is the probability that more than \(2\) in the sample are female?

Show that \(\mathbb{E}[X_{i}]=np\) and \(\mathbb{V}[X_{i}]=np(1-p)\). Start with the case \(n=1\), then the case \(n=2\), then case \(n=3\), then the general case of any \(n\).

The Binomial Limit Theorem (de Moivre–Laplace theorem) says that as \(n\) grows large, with \(p \in (0,1)\) staying fixed, the Binomial distribution is approximately normal with mean \(np\) and variance \(np(1-p)\)

The unemployment rate is \(10%\). Suppose that \(100\) employable people are selected randomly. What is the probability that this sample contains between \(9\) and \(12\) unemployed people. Use the normal approximation to binomial probabilities.

Parameters are \(\mu \approx n p = 10\) \(\sigma^2= n p (1−p) = 100 (0.1)(0.9)=9\) \(\sigma=\sqrt{9}=3\)).

Poisson.

The number of events in a fixed interval

Code
## Minimal example
lambda <- 3.5

# PMF plot
x <- seq(0, 15)
f_x <- dpois(x, lambda)
plot(x, f_x, type="h", col="blue",
     xlab = "x", ylab = "Prob(X = x)")
points(x, f_x, pch = 16, col = "blue")
title(bquote(paste('Pois(',.(lambda), ')')))

Code


# Simulate: Compare empirical vs theoretical
#X <- rpois(1e4, lambda)
#c(emp_mean = mean(X), th_mean = lambda)
#c(emp_var  = var(X),  th_var  = lambda)

Show that \(\mathbb{E}[X_{i}] = \mathbb{V}[X_{i}]= \lambda\).

Irwin–Hall.

The sum of \(n\) i.i.d. \(\text{Uniform}(0,1)\).

  • Continuous, support \([0,n]\)
  • Probability Density Function: \(f(x) = \dfrac{1}{(n-1)!} \displaystyle\sum_{k=0}^{\lfloor x \rfloor} (-1)^k \binom{n}{k} (x - k)^{n-1}\) for \(x \in [0,n]\), and \(0\) otherwise
  • See https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution
  • A common use case: representing the sum of many small independent financial shocks
Code
## Minimal example (base R)
# Irwin–Hall PDF function
dirwinhall <- function(x, n) {
    f_x <- vector(length=length(x))
    for(i in seq(x)) {
        xx <- x[i]
        if(xx < 0 | xx > n){
            f_x[i] <- 0 
        } else {
            k <- seq(0, floor(xx))
            f_k <- sum((-1)^k*choose(n, k)*(xx-k)^(n-1))/factorial(n-1)
            f_x[i] <- f_k
        }
    }
    return(f_x)
}

# Parameters
n <- 2
x <- seq(0, n, length.out = 500)

# Compute and plot PDF
f_x <- dirwinhall(x, n)
plot(x, f_x, type="l", col="blue",
    main='', xlab = "x", ylab = "f(x)")
title(bquote(paste('IrwinHall(',.(n), ')')))

See also the Bates distribution, which is for the mean of \(n\) i.i.d. \(\text{Uniform}(0,1)\) random variables. It essentially rescales the Irwin-Hall distribution to range between \(0\) and \(1\).

Beta.

The sample space is any number on the unit interval, \(X_{i} \in [0,1]\), but with non-uniform probabilities.

Code
X4 <- rbeta(2000,2,2) ## two shape parameters
hist(X4, breaks=20, border=NA, main=NA, freq=F)

#See the underlying probabilities
#f_25 <- dbeta(.25, 2, 2)

x <- seq(0,1,by=.01)
fx <- dbeta(x, 2, 2)
lines(x, fx, col='blue')

The Beta distribution is mathematically complicated to write, and so we omit it. However, we can find the probability graphically using either the probability density function or cumulative distribution function.

Suppose \(X_{i}\) is a random variable with a beta distribution. Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.

Code
plot( ecdf(X4), main=NA) # Empirical

x <- seq(0,1,by=.01) # Theoretical
Fx <- pbeta(x, 2, 2)
lines(x, Fx, col='blue')

# Middle Interval Example 
F2 <- pbeta(0.2, 2, 2)
F8 <- pbeta(0.8, 2, 2)
F_2_8 <- F8 - F2
F_2_8
## [1] 0.792

# Visualize
title('Middle between 0.2 and 0.8')
segments( 0.2, F2, -1, F2, col='red')
segments( 0.8, F8, -1, F8, col='red')

This distribution is often used, as the probability density function has two parameters that allow it to take many different shapes.

For each example below, intuitively depict \(Prob(X_{i} \leq 0.5)\) using the PDF. Repeat the exercise using a CDF instead of a PDF to calculate a numerical value.

Code
op <- par(no.readonly = TRUE); on.exit(par(op), add = TRUE)
x <- seq(0,1,by=.01)
pars <- expand.grid( c(.5,1,2), c(.5,1,2) )
par(mfrow=c(3,3))
apply(pars, 1, function(p){
    fx <- dbeta( x,p[1], p[2])
    plot(x, fx, type='l', xlim=c(0,1), ylim=c(0,4), lwd=2, col='blue')
    #hist(rbeta(2000, p[1], p[2]), breaks=50, border=NA, main=NA, freq=F)
})
title('Beta densities', outer=T, line=-1)

9.3 Set Theory

Let the sample space contain events \(A\) and \(B\), with their probability denoted as \(Prob(A)\) and \(Prob(B)\). Then

  • the union: \(A\cup B\), refers to either event occurring
  • the intersection: \(A\cap B\), refers to both events occurring
  • the events \(A\) and \(B\) are mutually exclusive if and only if \(A\cap B\) is empty
  • the inclusion–exclusion rule is: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)\)
  • the events \(A\) and \(B\) are mutually independent if and only if \(Prob(A\cap B)=Prob(A) Prob(B)\)
  • the conditional probability is \(Prob(A \mid B) = Prob(A \cap B)/ P(B)\)

For example, consider a six sided die where events are the number \(>2\) or the number is even.

  • The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\)
  • To check mutually exclusive: notice \(A\cap B=\{4,6\}\) which is not empty. So event \(A\) and event \(B\) are not mutually exclusive

Further assuming the die are fair, each side has an equal \(1/6\) probability. This lets us compute

  • \(Prob(A)=4/6=2/3\) and \(Prob(B)=3/6=1/2\)
  • \(A\cup B=\{2,3,4,5,6\}\), which implies \(Prob(A\cup B)=5/6\)
  • \(A\cap B=\{4,6\}\), which implies \(Prob(A\cap B)=2/6=1/3\)
  • To check independence: notice \(Prob(A\cap B)=1/3 = (2/3) (1/2) = Prob(A)Prob(B)\). So event \(A\) and event \(B\) are mutually independent
  • To check inclusion–exclusion: notice \(Prob(A) + Prob(B) - Prob(A\cap B) = 2/3 + 1/2 - 1/3 = 4/6 + 3/6 - 2/6 = 5/6 = Prob(A\cup B)\)
  • Finally, the conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows even number}) = Prob(A \mid B) = \frac{Prob(A \cap B)}{P(B)} = \frac{1/3}{1/2} = 2/3\)
Code
# Simulation verification
set.seed(123)
x <- seq(1,6)
rolls <- sample(x, 1e6, replace = TRUE)
A <- rolls > 2
B <- rolls %% 2 == 0
P_AcondB <- mean(A & B) / mean(B)
round(P_AcondB, 3)
## [1] 0.667

Consider a six sided die with sample space \(\{1,2,3,4,5,6\}\) where each outcome is each equally likely. What is \(Prob(\text{die is odd or }<5)\)? Denote the odd events as \(A=\{1,3,5\}\) and the less than five events as \(B=\{1,2,3,4\}\). Then

  • \(A\cup B=\{1,3\}\), which implies \(Prob(A\cap B)=2/6=1/3\).
  • \(Prob(A)=3/6=1/2\) and \(Prob(B)=4/6=2/3\).
  • By inclusion–exclusion: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)=1/2+2/3-1/3=5/6\).

Now find \(Prob(\text{die is even and }<5)\). Verify your answer with a computer simulation.

Law of Total Probability.

This law states that \(Prob(A)=Prob(A \mid B) P(B) + Prob(A \mid \lnot B) P(\lnot B)\), where \(\lnot B\) is the event “not \(B\)”.

For example, consider a six sided die where events are the number \(>2\) or the number is even. - The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\) and \(\lnot B = \{1,3,5\}\) From before, we know that \(Prob(A)=2/3\), \(Prob(B)=1/2\), and \(Prob(A \mid B) = 2/3\). Then see that \(Prob(A \mid B) Prob(B) = \frac{2}{3} \frac{1}{2} = 2/6\)

We can also directly compute \(Prob(\lnot B)=1/2\), or infer it from \(Prob(\lnot B)=1-Prob(B)\). The conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows odd number}) = Prob(A \mid \lnot B) = \frac{Prob(A \cap \lnot B)}{P(\lnot B)} = 2/3\).

Then we can see that \(Prob(A \mid B) Prob(B) + Prob(A \mid \lnot B) Prob(\lnot B) = 2/6 + 2/6 = 2/3 = Prob(A)\).


  1. Drawing random uniform samples with computers is actually quite complex and beyond the scope of this course.↩︎