4  Random Variables


In the last section we computed a distribution given the data, whereas now we generate individual data points given the distribution.

Random variables are vectors whose values occur according to a frequency distribution. As such, random variables have a

We think of each observation \(\hat{X}_{i}\) before it is actually observed, potentially taking on specific values \(x\) from the sample space with known probabilities. For example, we consider flipping a coin before knowing whether it lands on heads or tails. We denote the random variable, in this case the unflipped coin, as \(X_{i}\).

There are two basic types of sample spaces: discrete (encompassing cardinal-discrete, factor-ordered, and factor-unordered data) and continuous. This leads to two types of random variables: discrete and continuous. However, each type has many different probability distributions.

Probability.

The most common random variables are easily accessible and can be described using the Cumulative Distribution Function (CDF) \[\begin{eqnarray} F(x) &=& Prob(X_{i} \leq x). \end{eqnarray}\] Note that this is just like the Empirical Cumulative Distribution Function (ECDF), \(\widehat{F}(x)\), except that it is now theoretically known. You can think of \(F(x)\) as the ECDF for a dataset with an infinite number of observations. Equivalently, the ECDF is an empirical version of the CDF that is applied to observed data.

After introducing different random variables, we will also cover some basic implications of their CDF. Intuitively, probabilities must sum up to one. So we can compute \(Prob(X_{i} > x) = 1- F(x)\). We also have two “in” and “out” probabilities.

The probability of \(X_{i}\leq b\) and \(X_{i}\geq a\) can be written in terms of falling into a range \(Prob(X_{i} \in [a,b])=Prob(a \leq X_{i} \leq b) = F(b) - F(a)\).

The opposite probability of \(X_{i} > b\) or \(X_{i} < a\) is \(Prob(X_{i} < a \text{ or } X_{i} > b) = F(a) + [1- F(b)]\). Notice that this opposite probability \(F(a) + [1- F(b)] =1 - [F(b) - F(a)]\), so that \(Prob(X_{i} \text{ out of } [a,b]) = 1 - Prob( X_{i} \in [a,b])\)

4.1 Discrete

A discrete random variable can take one of several values in a set. E.g., any number in \(\{1,2,3,...\}\) or any letter in \(\{A,B,C,...\}\). Theoretical proportions are referred to as a probability mass function, which can be thought of as a proportions bar plot for an infinitely large dataset. Equivalently, the bar plot is an empirical version of the probability mass function that is applied to observed data.

Bernoulli.

Think of a Coin Flip: Heads or Tails with equal probability. In general, a Bernoulli random variable denotes Heads as the event \(X_{i}=1\) and Tails as the event \(X_{i}=0\), and allows the probability of Heads to vary. \[\begin{eqnarray} X_{i} &\in& \{0,1\} \\ Prob(X_{i} =0) &=& 1-p \\ Prob(X_{i} =1) &=& p \\ F(x) &=& \begin{cases} 0 & x<0 \\ 1-p & x \in [0,1) \\ 1 & x\geq 1 \end{cases} \end{eqnarray}\]

Here is an example of the Bernoulli distribution. While you might get all heads (or all tails) in the first few coin flips, the ratios level out to their theoretical values after many flips.

Code
x <- c(0,1)
x_probs <- c(3/4, 1/4)

sample(x, 1, prob=x_probs, replace=T) # 1 Flip
## [1] 0
sample(x, 4, prob=x_probs, replace=T) # 4 Flips 
## [1] 0 0 0 0
X0 <- sample(x, 400, prob=x_probs, replace=T)

# Plot Cumulative Proportion
X0_t <- seq(1, length(X0)) #head(X0_t)
X0_mt <- cumsum(X0)/X0_t #head(X0_mt)
par(mar=c(4,4,1,4))
plot(X0_t, X0_mt, type='l',
    ylab='Cumulative Proportion (p)',
    xlab='Flip #', 
    ylim=c(0,1), 
    lwd=2)
# Show individual flip outcomes
points(X0_t, X0, col=grey(0,.5),
    pch='|', cex=.3)
# Show theoretical proportions
abline(h=0.25, col='blue')

Code

# Plot Long run proportions
proportions <- table(X0)/length(X0)
plot(proportions, 
    col=grey(0, 0.5),
    xlab='Flip Outcome', 
    ylab='Proportion',
    main=NA)
points(c(0,1), c(.75, .25), pch=16, col='blue') # Theoretical values

Code

# Plot CDF
plot( ecdf(X0), col=grey(0,.5),
    pch=16, main=NA) #Empirical

Code
#points(c(0,1), c(.75, 1), pch=16, col='blue') # Theoretical

Discrete Uniform.

Discrete numbers with equal probability, such as a die with \(K\) sides. \[\begin{eqnarray} X_{i} &\in& \{1,...K\} \\ Prob(X_{i} =1) &=& Prob(X_{i} =2) = ... = 1/K\\ F(x) &=& \begin{cases} 0 & x<1 \\ 1/K & x \in [1,2) \\ 2/K & x \in [2,3) \\ \vdots & \\ 1 & x\geq K \end{cases} \end{eqnarray}\]

Here is an example with \(K=4\). E.g., rolling a four-sided die.

The probability of a value smaller than or equal to \(3\) is \(Prob(X_{i} \leq 3)=1/4 + 1/4 + 1/4 = 3/4\).

The probability of a value larger than \(3\) is \(Prob(X_{i} > 3) = 1-Prob(X_{i} \leq 3)=1/4\).

The probability of a value \(>\) 1 and \(\leq 3\) is \(Prob(1 < X_{i} \leq 3) = Prob(X_{i} \leq 3) - \left[ 1- Prob(X_{i} \leq 1) \right] = 3/4 - 1/4 = 2/4\).1

The probability of a value \(\leq\) 1 or \(> 3\) is \(Prob(X_{i} \leq 1 \text{ or } X_{i} > 3) = Prob(X_{i} \leq 1) + \left[ 1- Prob(X_{i} \leq 3) \right] = 1/4 + [1 - 3/4]=2/4\).

Code
x <- c(1,2,3,4)
x_probs <- c(1/4, 1/4, 1/4, 1/4)
# sample(x, 1, prob=x_probs, replace=T) # 1 roll
X1 <- sample(x, 2000, prob=x_probs, replace=T) # 2000 rolls

# Plot Long run proportions
proportions <- table(X1)/length(X1)
plot(proportions, col=grey(0,.5),
    xlab='Outcome', ylab='Proportion', main=NA)
points(x, x_probs, pch=16, col='blue') # Theoretical values

Code

# Hist w/ Theoretical Counts
# hist(X1, breaks=50, border=NA, main=NA, ylab='Count')
# points(x, x_probs*length(X1), pch='-') 

# Alternative Plot
plot( ecdf(X1), pch=16, col=grey(0,.5), main=NA)

Code

# Alternative Plot 2
#props <- table(X1)
#barplot(props, ylim = c(0, 0.35), ylab = "Proportion", xlab = "Value")
#abline(h = 1/4, lty = 2)

Note that the Discrete Uniform distribution generalizes to arbitrary intervals, although we will not exploit the generalization in this class.

Multinoulli (aka Categorical).

Numbers \(1,...K\) with unequal probabilities. \[\begin{eqnarray} X_{i} &\in& \{1,...K\} \\ Prob(X_{i} =1) &=& p_{1} \\ Prob(X_{i} =2) &=& p_{2} \\ &\vdots& \\ p_{1} + p_{2} + ... &=& 1\\ F(x) &=& \begin{cases} 0 & x<1 \\ p_{1} & x \in [1,2) \\ p_{1} + p_{2} & x \in [2,3) \\ \vdots & \\ 1 & x\geq K \end{cases} \end{eqnarray}\]

We can also replace numbers with letters \((A,...Z)\) or names \((John, Jamie, ...)\) although we must be careful with the CDF when there is no longer a natural ordering. Here is an empirical example with three outcomes

Code
x <- c('A', 'B', 'C')
x_probs <- c(3/10, 1/10, 6/10)
sum(x_probs)
## [1] 1
X2 <- sample(x, 2000, prob=x_probs, replace=T) # sample of 2000

# Plot Long run proportions
proportions <- table(X2)/length(X2)
plot(proportions, col=grey(0,.5),
    xlab='Outcome', ylab='Proportion', main=NA)
points(x_probs, pch=16, col='blue') # Theoretical values

Code

# Histogram version
# X2_alt <- X1
# X2_alt[X2_alt=='A'] <- 1
#X2_alt[X2_alt=='B'] <- 2
#X2_alt[X2_alt=='C'] <- 3
#X2_alt <- as.numeric(X1_alt)
#hist(X2_alt, breaks=50, border=NA, 
#    main=NA, ylab='Count')
#points(x, x_probs*length(X2_alt), pch=16) ## Theoretical Counts

# Alternative Plot
# plot( ecdf(X2), pch=16, col=grey(0,.5), main=NA)

Suppose there is an experiment with three possible outcomes, \(\{A, B, C\}\). It was repeated \(50\) times and discovered that \(A\) occurred \(10\) times, \(B\) occurred \(13\) times, and \(C\) occurred \(27\) times. The estimated probability of each outcome is found via the bar plot \(\hat{p}_{A} = 10/50\), \(\hat{p}_{B} = 13/50\), \(\hat{p}_{A} = 27/50\). We can also estimate the “in” probabilities as \(\widehat{Prob}(A \text{ or } B)=10/50+13/50=23/50\) and \(\widehat{Prob}(B \text{ or } C)=13/50+27/50=40/50\), as well as the “out” probability as \(\widehat{Prob}(A \text{ or } C)=13/50+27/50=37/50\).

Suppose there are three possible outcomes of an experiment, \(\{\text{my car dies}, \text{it rains next Tuesday},\text{a cat is born}\}\), which have corresponding probabilities \(\{3/10, 1/10, 6/10 \}\) that are known theoretically. Compute the probability that my car dies or a cat is born.

4.2 Continuous

A continuous random variable can take one value out of an uncountably infinite number. E.g., any number between \(0\) and \(1\) with any number of decimal points. With a continuous random variable, the probability of any individual point is zero, so we describe these variables with the cumulative distribution function (CDF), \(F\), or the probability density function (PDF), \(f\). Just as \(F\) can be thought of as the ECDF, \(\widehat{F}\), with an infinite amount of data, \(f\) can be thought of as a histogram, \(\widehat{f}\), with an infinite amount of data. Equivalently, the histogram is an empirical version of the PDF that is applied to observed data.

Often, the PDF helps you intuitively understand a random variable whereas the CDF helps you calculate numerical values. This is because probabilities are depicted as areas in the PDF and the CDF accumulates those areas: \(F(x)\) equals the area under the PDF from \(-\infty\) to \(x\). For example, \(Prob(X_{i} \leq 1)\) is depicted by the PDF as the area under \(f(x)\) from the lowest possible value until \(x=1\), which is numerically calculated simply as \(F(1)\).

Continuous Uniform.

Any number on a unit interval allowing for any number of decimal points, with every interval of the same size having the same probability. \[\begin{eqnarray} X_{i} &\in& [0,1] \\ f(x) &=& \begin{cases} 1 & x \in [0,1] \\ 0 & \text{Otherwise} \end{cases}\\ F(x) &=& \begin{cases} 0 & x < 0 \\ x & x \in [0,1] \\ 1 & x > 1. \end{cases} \end{eqnarray}\]

The probability of a value being exactly \(0.25\) is \(Prob(X_{i} =0.25)=0\).

The probability of a value smaller than \(0.25\) is \(F(0.25)=0.25\).

The probability of a value larger than \(0.25\) is \(1-F(0.25)=0.75\).

The probability of a value in \((0.25,0.75]\) is \(Prob(0.25 < X_{i} \leq 0.75) = Prob(X_{i} \leq 0.75) - \left[ 1- Prob(X_{i} \leq 0.25) \right] = 0.75 - 0.25 = 0.5\).

The probability of a value in \((0.2,0.7]\) is \(Prob(0.2 < X_{i} \leq 0.7) = Prob(X_{i} \leq 0.7) - \left[ 1- Prob(X_{i} \leq 0.2) \right] = 0.7 - 0.2 = 0.5\).

The probability of a value outside of \((0.2,0.7]\) is \(Prob(X_{i} \leq 0.2 \text{ or } x > 0.7) = 0.2 + [1-0.7]=0.5\). Alternatively, you can compute \(1- Prob(0.2 < X_{i} \leq 0.7)=1-0.5=0.5\).

Code
# Prob. < 0.25
punif(0.25)
## [1] 0.25

# Prob. > 0.25
1-punif(0.25)
## [1] 0.75

# Prob. in (0.2,0.7]
F_two <- punif( c(0.2, 0.7) )
F_in <- F_two[2] - F_two[1]
F_in
## [1] 0.5

# Prob. out of (0.2,0.7]
F_out <- 1- F_in
F_out
## [1] 0.5
Code
runif(3) # 3 draws
## [1] 0.8893494 0.3556231 0.2754138

# Empirical Density 
X3 <- runif(2000)
hist(X3, breaks=20, border=NA, main=NA, freq=F)
# Theoretical Density
x <- seq(-0.1,1.1,by=.001)
fx <- dunif(x)
lines(x, fx, col='blue')

Code

# CDF example 1
P_low <- punif(0.25)
P_low
## [1] 0.25
# Uncomment to show via PDF
# x_low <- seq(0,0.25,by=.001)
# fx_low <- dunif(x_low)
# polygon( c(x_low, rev(x_low)), c(fx_low,fx_low*0),
#    col=rgb(0,0,1,.25), border=NA)
    
# CDF example 2
P_high <- 1-punif(0.25)
P_high
## [1] 0.75
# Uncomment to show via PDF
# x_high <- seq(0.25,1,by=.001)
# fx_high <- dunif(x_high)
# polygon( c(x_high, rev(x_high)), c(fx_high,fx_high*0),
#    col=rgb(0,0,1,.25), border=NA)
    
# CDF example 3
P_mid <- punif(0.75) - punif(0.25)
P_mid
## [1] 0.5
# Uncomment to show via PDF
# x_mid <-  seq(0.25,0.75,by=.001)
# fx_mid <- dunif(x_mid)
# polygon( c(x_mid, rev(x_mid)), c(fx_mid,fx_mid*0),
#    col=rgb(0,0,1,.25), border=NA)

Note that the Continuous Uniform distribution generalizes to an arbitrary interval, \(X_{i} \in [a,b]\). In this case, \(f(x)=1/[b-a]\) if \(x \in [a,b]\) and \(F(x)=[x-a]/[b-a]\) if \(x \in [a,b]\).

Suppose \(X_{i}\) is a random variable continuously distributed over \(a=-2\) and \(b=2\). What is the probability of a value larger than \(0.25\)? First use the computer to suggest an answer: simulate \(1000\) draws and then make a histogram and an ECDF. Then find the answer mathematically using the CDF. Finally, verify the answer is intuitively correct in a figure of the PDF. You should draw by hand both the CDF and the PDF with correct axes labels and marking clearly the probability of a value larger than \(0.25\).

Code
# Simulation
X <- runif(1000, -2, 2)
# ... 

# Draw by computer the CDF, F(x)
x <- seq(-3,3,by=0.005)
Fx <- punif(x, -2, 2)
plot(x, Fx, type='l', col='blue')
# Answer
Fstar <- punif(0.25, -2, 2)
# Visualize Answer
segments(0.25, 0, 0.25, Fstar,
    col=rgb(0,0,1, 0.25))

Code

# Draw by computer the PDF, f(x)
fx <- dunif(x, -2, 2)
plot(x, fx, type='l', col='blue')
# Visualize Answer
fstar <- dunif(0.25, -2, 2)
polygon(
    c(-2, 0.25, 0.25, -2), 
    c(0, 0, fstar,fstar),
    col=rgb(0,0,1, 0.25), border=NA)

Suppose the flight time between Calgary and Kamloops is Uniformly distributed between \(68\) and \(78\) minutes. According to Air Canada the flight takes \(70\) minutes. What is the probability that the flight will be late?

Beta.

The sample space is any number on the unit interval, \(X_{i} \in [0,1]\), but with non-uniform probabilities.

Code
X4 <- rbeta(2000,2,2) ## two shape parameters
hist(X4, breaks=20, border=NA, main=NA, freq=F)

#See the underlying probabilities
#f_25 <- dbeta(.25, 2, 2)

x <- seq(0,1,by=.01)
fx <- dbeta(x, 2, 2)
lines(x, fx, col='blue')

The Beta distribution is mathematically complicated to write, and so we omit it. However, we can find the probability graphically using either the probability density function or cumulative distribution function.

Suppose \(X_{i}\) is a random variable with a beta distribution. Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.

Code
plot( ecdf(X4), main=NA) # Empirical

x <- seq(0,1,by=.01) # Theoretical
Fx <- pbeta(x, 2, 2)
lines(x, Fx, col='blue')

# Middle Interval Example 
F2 <- pbeta(0.2, 2, 2)
F8 <- pbeta(0.8, 2, 2)
F_2_8 <- F8 - F2
F_2_8
## [1] 0.792

# Visualize
title('Middle between 0.2 and 0.8')
segments( 0.2, F2, -1, F2, col='red')
segments( 0.8, F8, -1, F8, col='red')

This distribution is often used, as the probability density function has two parameters that allow it to take many different shapes.

For each example below, intuitively depict \(Prob(X_{i} \leq 0.5)\) using the PDF. Repeat the exercise using a CDF instead of a PDF to calculate a numerical value.

Code
op <- par(no.readonly = TRUE); on.exit(par(op), add = TRUE)
x <- seq(0,1,by=.01)
pars <- expand.grid( c(.5,1,2), c(.5,1,2) )
par(mfrow=c(3,3))
apply(pars, 1, function(p){
    fx <- dbeta( x,p[1], p[2])
    plot(x, fx, type='l', xlim=c(0,1), ylim=c(0,4), lwd=2, col='blue')
    #hist(rbeta(2000, p[1], p[2]), breaks=50, border=NA, main=NA, freq=F)
})
title('Beta densities', outer=T, line=-1)

Exponential.

The sample space is any positive number.2 An Exponential random variable has a single parameter, \(\lambda>0\), that governs its shape \[\begin{eqnarray} X_{i} &\in& [0,\infty) \\ f(x) &=& \lambda exp\left\{ -\lambda x \right\} \\ F(x) &=& \begin{cases} 0 & x < 0 \\ 1- exp\left\{ -\lambda x \right\} & x \geq 0. \end{cases} \end{eqnarray}\]

Code
rexp(3) # 3 draws
## [1] 1.6924576 0.3711728 2.7467728

X5 <- rexp(2000)
hist(X5, breaks=20,
    border=NA, main=NA,
    freq=F, ylim=c(0,1), xlim=c(0,10))
    
x <- seq(0,10,by=.1)
fx <- dexp(x)
lines(x, fx, col='blue')

Suppose the lifetime of a battery is an exponential random variable with \(\lambda=1/50\). Using the computer, find the probability that the lifetime is \(< 10\) hours. Find the probability that the lifetime is \(\geq 100\) hours. Use the computer to find the probability that the lifetime is between \(10\) and \(100\) hours.

Code
pexp(10, 1/50)
## [1] 0.1812692

Normal (Gaussian).

This distribution is for any number on the real line, with bell shaped probabilities. The Normal distribution is mathematically complex and sometimes called the Gaussian distribution. We call it “Normal” because we will encounter it again and again and again. The probability density function \(f\) has two parameters \(\mu \in (\infty,\infty)\) and \(\sigma > 0\). \[\begin{eqnarray} X_{i} &\in& (\infty,\infty) \\ f(x) &=& \frac{1}{\sqrt{2\pi \sigma^2}} exp\left\{ \frac{-(x-\mu)^2}{2\sigma^2} \right\} \end{eqnarray}\]

Code
rnorm(3) # 3 draws
## [1] -1.0645264  0.8261311  0.7678637

X6 <- rnorm(2000)
hist(X6, breaks=20,
    border=NA, main=NA,
    freq=F, ylim=c(0,.4), xlim=c(-4,4))

x <- seq(-10,10,by=.025)
fx <- dnorm(x)
lines(x, fx, col='blue')

Even thought the distribution function is complex, we can compute CDF values using the computer.

Code
pnorm( c(-1.645, 1.645) ) # 10%
## [1] 0.04998491 0.95001509
pnorm( c(-2.576, 2.576) ) #  1%
## [1] 0.004997532 0.995002468

Suppose \(X_{i}\) is a random variable with a normal distribution with \(\mu=0\) and \(\sigma=1\). Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.

Code
plot( ecdf(X6), main=NA) # Empirical

x <- seq(-4,4,by=.01) # Theoretical
Fx <- pnorm(x, 0, 1)
lines(x, Fx, col='blue')

# Middle Interval Example 
F2 <- pnorm(0.2, 0, 1)
F8 <- pnorm(0.8, 0, 1)
F_2_8 <- F8 - F2
F_2_8
## [1] 0.2088849

# Visualize
title('Middle between 0.2 and 0.8')
segments( 0.2, F2, -5, F2, col='red')
segments( 0.8, F8, -5, F8, col='red')

Suppose that your health status is a normally distributed random variable with \(\mu=2\) and \(\sigma=3\). If we randomly sample one person, what is the probability there health status is higher than \(4\)?

Code
# Start with a simulation of 1000 people to build intuition
X <- rnorm(1000,2,3)
hist(X, freq=F, border=NA, main=NA)

Code
sum(X>4)/1000
## [1] 0.26

# Do an exact calculation
1-pnorm(4,2,3)
## [1] 0.2524925

Draw the Middle \(90\%\) of a normal distribution

Code
# PDF
x <- seq(-10,10,by=.025)
fx <- dnorm(x)
plot(x, fx,
    col='blue', type='l',
    ylab='Theoretical Density: f(x)',
    main='Middle 90%')
# Show Middle 90%
x_90 <- seq(-1.645,1.645,by=.025)
fx_90 <- dnorm(x_90)
polygon( c(x_90, rev(x_90)), c(fx_90,fx_90*0),
    col=rgb(0,0,1,.25), border=NA)

Code
pnorm(1.645)-pnorm(-1.645)
## [1] 0.9000302

Suppose scores in math class are approximately normally distributed with \(\mu=50, \sigma=1\). If you selected one student randomly, what is the probability their score is higher than \(90\). Is \(Prob(X_{i}\geq 90)\) higher if \(\mu=25, \sigma=2\)? What about \(\mu=10, \sigma=5\)?

4.3 Further Reading

Note that many random variables are related to each other

Also note that numbers randomly generated on your computer cannot be truly random, they are “Pseudorandom”.


  1. This is the general formula using CDFs, and you can verify it works in this instance by directly adding the probability of each 2 or 3 event: \(Prob(X_{i} = 2) + Prob(X_{i} = 3) = 1/4 + 1/4 = 2/4\).↩︎

  2. In other classes, you may further distinguish types of random variables based on whether their maximum value is theoretically finite or infinite.↩︎