4  Random Variables


In the last section we computed a distribution given the data, whereas now we generate data given the distribution.

Random variables are vectors that are generated from a known Cumulative Distribution Function or Probability Distribution Function, which describes the long run frequencies of all possible outcomes. Random variables have a

There are only two basic types of sample spaces: discrete (encompassing cardinal-discrete, factor-ordered, and factor-unordered data) and continuous, which lead to two types of random variables. In any case, probabilities must sum up to 1.

However, each type of random variable has many different probability distributions. The most common ones are easily accessible and can be described using the Cumulative Distribution Function \[\begin{eqnarray} F(x) &=& Prob(X_{i} \leq x). \end{eqnarray}\] Note that this is just like the ECDF, \(\widehat{F}(x)\), except that it is now theoretically known instead of estimated.1

After introducing different random variables, we will also cover some basic implications of their CDF. Generally, we have \(Prob(X_{i} > x) = 1- F(x)\). We also have two “in” and “out” probabilities.

The probability of \(X_{i}\leq b\) and \(X_{i}\geq a\) can be written in terms of falling into a range \(Prob(X_{i} \in [a,b])=Prob(a \leq X_{i} \leq b) = F(b) - F(a)\).

The opposite probability of \(X_{i} > b\) or \(X_{i} < a\) is \(Prob(X_{i} < a \text{ or } X_{i} > b) = F(a) + [1- F(b)]\). Notice that this opposite probability \(F(a) + [1- F(b)] =1 - [F(b) - F(a)]\), so that \(Prob(X_{i} \text{ out of } [a,b]) = 1 - Prob( X_{i} \in [a,b])\)

4.1 Discrete

The random variable can take one of several values in a set. E.g., any number in \(\{1,2,3,...\}\) or any letter in \(\{A,B,C,...\}\).

Bernoulli.

Think of a Coin Flip: Heads=1 or Tails=0, with equal probability. In general, the probability of heads can vary. \[\begin{eqnarray} X_{i} &\in& \{0,1\} \\ Prob(X_{i} =0) &=& 1-p \\ Prob(X_{i} =1) &=& p \\ F(x) &=& \begin{cases} 0 & x<0 \\ 1-p & x \in [0,1) \\ 1 & x\geq 1 \end{cases} \end{eqnarray}\]

Here is an example of the Bernoulli distribution

Code
rbinom(1, 1, 0.25) # 1 Flip
## [1] 1
rbinom(4, 1, 0.25) # 4 Flips
## [1] 0 0 1 0
X0 <- rbinom(400, 1, 0.25)

# Plot Cumulative Proportion
X0_t <- seq_len(length(X0)) #head(X0_t)
X0_mt <- cumsum(X0)/X0_t #head(X0_mt)
par(mar=c(4,4,1,4))
plot(X0_t, X0_mt, type='l',
    ylab='Cumulative Proportion (p)',
    xlab='Flip #', 
    ylim=c(0,1), 
    lwd=2)
# Add individual flip outcomes
points(X0_t, X0, col=grey(0,.5),
    pch='|', cex=.3)

Code

# Plot Long run proportions
proportions <- table(X0)/length(X0)
plot(proportions, col=grey(0,.5),
    xlab='Flip Outcome', ylab='Count', main=NA)
points(c(0,1), c(.75, .25), pch=16, col='blue') # Theoretical values

Code

# Plot CDF
plot( ecdf(X0), pch=16, col=grey(0,.5), main=NA) #Empirical

Code
#points(c(0,1), c(.75, 1), pch=16, col='blue') # Theoretical

Discrete Uniform.

Discrete numbers with equal probability \[\begin{eqnarray} X_{i} &\in& \{1,...N\} \\ Prob(X_{i} =1) &=& Prob(X_{i} =2) = ... = 1/N\\ F(x) &=& \begin{cases} 0 & x<1 \\ 1/N & x \in [1,2) \\ 2/N & x \in [2,3) \\ \vdots & \\ 1 & x\geq N \end{cases} \end{eqnarray}\]

Here is an example with \(N=4\).

The probability of a value smaller than or equal to \(3\) is \(Prob(X_{i} \leq 3)=1/4 + 1/4 + 1/4 = 3/4\).

The probability of a value larger than \(3\) is \(Prob(X_{i} > 3) = 1-Prob(X_{i} \leq 3)=1/4\).

The probability of a value of a value \(>\) 1 and \(\leq 3\) is \(Prob(1 < X_{i} \leq 3) = Prob(X_{i} \leq 3) - \left[ 1- Prob(X_{i} \leq 1) \right] = 3/4 - 1/4 = 2/4\).2

The probability of a value of a value \(\leq\) 1 or \(> 3\) is \(Prob(X_{i} \leq 1 \text{or} X_{i} > 3) = Prob(X_{i} \leq 1) + \left[ 1- Prob(X_{i} \leq 3) \right] = 1/4 + [1 - 3/4]=2/4\).

Code
x <- 1:4
x_probs <- rep(1/4,4)
# sample(x, 1, replace=T, prob=x_probs) # sample of 1
X1 <- sample(x, 2000, replace=T, prob=rep(1/4,4))

# Plot Long run proportions
proportions <- table(X1)/length(X1)
plot(proportions, col=grey(0,.5),
    xlab='Outcome', ylab='Proportion', main=NA)
points(x, x_probs, pch=16, col='blue') # Theoretical values

Code

# Hist w/ Theoretical Counts
# hist(X1, breaks=50, border=NA, main=NA, ylab='Count')
# points(x, x_probs*length(X1), pch='-') 

# Alternative Plot
plot( ecdf(X1), pch=16, col=grey(0,.5), main=NA)

Code

# Alternative Plot 2
#props <- table(X1)
#barplot(props, ylim = c(0, 0.35), ylab = "Proportion", xlab = "Value")
#abline(h = 1/4, lty = 2)

Note that the Discrete Uniform distribution generalizes to arbitrary intervals, although we will not exploit the generalization in this class.

Multinoulli (aka Categorical).

Numbers 1,…N with unequal probabilities. \[\begin{eqnarray} X_{i} &\in& \{1,...N\} \\ Prob(X_{i} =1) &=& p_{1} \\ Prob(X_{i} =2) &=& p_{2} \\ &\vdots& \\ p_{1} + p_{2} + ... &=& 1\\ F(x) &=& \begin{cases} 0 & x<1 \\ p_{1} & x \in [1,2) \\ p_{1} + p_{2} & x \in [2,3) \\ \vdots & \\ 1 & x\geq N \end{cases} \end{eqnarray}\]

We can also replace numbers with letters (A,…Z) or names (John, Jamie, …) although we must be careful with the CDF when there is no longer a natural ordering. Here is an empirical example with three outcomes

Code
x <- c('A', 'B', 'C')
x_probs <- c(3,6,1)/10
sum(x_probs)
## [1] 1
X1 <- sample(x, 2000, replace=T, prob=x_probs) # sample of 2000,

# Plot Long run proportions
proportions <- table(X1)/length(X1)
plot(proportions, col=grey(0,.5),
    xlab='Outcome', ylab='Proportion', main=NA)
points(x_probs, pch=16, col='blue') # Theoretical values

Code

# Histogram version
# X1_alt <- X1
# X1_alt[X1_alt=='A'] <- 1
#X1_alt[X1_alt=='B'] <- 2
#X1_alt[X1_alt=='C'] <- 3
#X1_alt <- as.numeric(X1_alt)
#hist(X1_alt, breaks=50, border=NA, 
#    main=NA, ylab='Count')
#points(x, x_probs*length(X1_alt), pch=16) ## Theoretical Counts

# Alternative Plot
# plot( ecdf(X1), pch=16, col=grey(0,.5), main=NA)

An experiment with three possible outcomes (E1, E2, E3) has been repeated 50 times, and it was learned that E1 occurred 10 times, E2 occurred 13 times, and E3 occurred 27 times. Assign probabilities to the outcomes.

4.2 Continuous

The random variable can take one value out of an uncountably infinite number. We describe these variables with the cumulative distribution function \(F\), or the probability density function \(f\). With a continuous random variable, the probability of any individual point is zero.

Continuous Uniform.

Any number on a unit interval allowing for any number of decimal points, with every number having the same probability. \[\begin{eqnarray} X_{i} &\in& [0,1] \\ f(x) &=& \begin{cases} 1 & x \in [0,1] \\ 0 & \text{Otherwise} \end{cases}\\ F(x) &=& \begin{cases} 0 & x < 0 \\ x & x \in [0,1] \\ 1 & x > 1. \end{cases} \end{eqnarray}\]

The probability of a value being exactly \(0.25\) is \(Prob(X_{i} =0.25)=0\).

The probability of a value smaller that \(0.25\) is \(F(0.25)=0.25\).

The probability of a value larger than \(0.25\) is \(1-F(0.25)=0.75\).

The probability of a value in \((0.25,0.75]\) is \(Prob(0.25 < X_{i} \leq 0.75) = Prob(X_{i} \leq 0.75) - \left[ 1- Prob(X_{i} \leq 0.25) \right] = 0.75 - 0.25 = 0.5\).

The probability of a value outside of \((0.25,0.75]\) is \(Prob(X_{i} \leq 0.25 \text{ or } x > 0.75) = 0.25 + [1-0.75]=0.5\).

Code
runif(3) # 3 draws
## [1] 0.41897580 0.86104446 0.05647309

# Empirical Density 
X2 <- runif(2000)
hist(X2, breaks=20, border=NA, main=NA, freq=F)
# Theoretical Density
x <- seq(0,1,by=.01)
fx <- dunif(x)
lines(x, fx, col='blue')

Code

# CDF examples
punif(0.25)
## [1] 0.25
1-punif(0.25)
## [1] 0.75
punif(0.75) - punif(0.25)
## [1] 0.5

Note that the Continuous Uniform distribution generalizes to an arbitrary interval, \(X_{i} \in [a,b]\). What is the probability of a value larger than \(0.25\) when \(a=-b=2\)? First use the computer to suggest an answer and then show the answer mathematically.

Suppose the flight time (in minutes) between Calgary and Kamloops has Uniform distribution with parameters a = 68 and b = 78. According to Air Canada the flight takes 70 minutes. What is the probability that the flight will be late? What is the probability that a flight takes between 65 and 70 minutes?

Beta.

Any number on the unit interval, \(X_{i} \in [0,1]\), but with unequal probabilities.

Code
X3 <- rbeta(2000,2,2) ## two shape parameters
hist(X3, breaks=20, border=NA, main=NA, freq=F)

#See the underlying probabilities
#f_25 <- dbeta(.25, 2, 2)

x <- seq(0,1,by=.01)
fx <- dbeta(x, 2, 2)
lines(x, fx, col='blue')

The Beta distribution is mathematically complicated to write, and so we omit it. But it is often used, as the density function has two parameters that allow it to take many different shapes.

Code
op <- par(no.readonly = TRUE); on.exit(par(op), add = TRUE)
x <- seq(0,1,by=.01)
pars <- expand.grid( c(.5,1,2), c(.5,1,2) )
par(mfrow=c(3,3))
apply(pars, 1, function(p){
    fx <- dbeta( x,p[1], p[2])
    plot(x, fx, type='l', xlim=c(0,1), ylim=c(0,4), lwd=2, col='blue')
    #hist(rbeta(2000, p[1], p[2]), breaks=50, border=NA, main=NA, freq=F)
})
title('Beta densities', outer=T, line=-1)

Exponential.

Any positive number.3 The Exponential distribution has a single parameter, \(\lambda>0\), that governs its shape \[\begin{eqnarray} X_{i} &\in& [0,\infty) \\ f(x) &=& \lambda exp\left\{ -\lambda x \right\} \\ F(x) &=& \begin{cases} 0 & x < 0 \\ 1- exp\left\{ -\lambda x \right\} & x \geq 0. \end{cases} \end{eqnarray}\]

Code
rexp(3) # 3 draws
## [1] 1.1415113 0.3682949 1.9939697

X3 <- rexp(2000)
hist(X3, breaks=20,
    border=NA, main=NA,
    freq=F, ylim=c(0,1), xlim=c(0,10))
    
x <- seq(0,10,by=.1)
fx <- dexp(x)
lines(x, fx, col='blue')

Suppose the lifetime of a battery is an exponential random variable with \(\lambda=1/50\). Find the probability that the lifetime is \(\geq 100\) hours. Find the probability that the lifetime is between \(50\) and \(100\) hours.

Normal (Gaussian).

This distribution is for any number on the real line, with bell shaped probabilities. The Normal distribution is mathematically complex and sometimes called the Gaussian distribution. We call it “Normal” because we will encounter it again and again and again. The density function \(f\) has two parameters \(\mu \in (\infty,\infty)\) and \(\sigma > 0\). \[\begin{eqnarray} X_{i} &\in& (\infty,\infty) \\ f(x) &=& \frac{1}{\sqrt{2\pi \sigma^2}} exp\left\{ \frac{-(x-\mu)^2}{2\sigma^2} \right\} \end{eqnarray}\]

Code
rnorm(3) # 3 draws
## [1] 1.5501548 0.7818682 0.5465561

X4 <- rnorm(2000)
hist(X4, breaks=20,
    border=NA, main=NA,
    freq=F, ylim=c(0,.4), xlim=c(-4,4))

x <- seq(-10,10,by=.025)
fx <- dnorm(x)
lines(x, fx, col='blue')

Even thought the distribution function is complex, we can compute CDF values using the computer

Code
pnorm( c(-1.645, 1.645) )
## [1] 0.04998491 0.95001509
pnorm( c(-2.326, 2.326) )
## [1] 0.01000928 0.98999072

Suppose scores in math class are approximately normally distributed with \(\mu=50, \sigma=1\). If you selected one student randomly, what is the probability their score is higher than \(90\). Is \(Prob(X_{i}\geq 90)\) higher if \(\mu=25, \sigma=2\)?

4.3 Drawing Samples

To generate a random variable from known distributions, you can use some type of physical machine. E.g., you can roll a fair die to generate Discrete Uniform data or you can roll weighted die to generate Categorical data.

There are also several ways to computationally generate random variables from a probability distribution. Perhaps the most common one is “inverse sampling”.

Random variables have an associated quantile function, which is the inverse of the CDF: the \(x\) value where \(p\) percent of the data fall below it. \[\begin{eqnarray} Q(p) = F^{-1}(p), \quad p\in [0,1] \end{eqnarray}\] (Recall that the median is the value \(x\) where \(50\%\) of the data fall below \(x\), for example.) To generate a random variable using inverse sampling, first sample \(p\) from a uniform distribution and then find the associated quantile.4

Using Data.

You can generate a random variable from a known empirical distribution. Inverse sampling randomly selects observations from the dataset with equal probabilities. To implement this, we

  • order the data and associate each observation with an ECDF value
  • draw an ECDF probability \(p\) as a random variable with equal probabilities
  • finding the associated data point
Code
# Empirical Distribution
X <- USArrests$Murder
FX_hat <- ecdf(X)
plot(FX_hat, lwd=2, xlim=c(0,20),
    pch=16, col=grey(0,.5), main='')

# Two Examples of generating a random variable
p <- c(.25, .9) # pretended to be random
cols <- c(2,4)
QX_hat <- quantile(X, p, type=1)
segments(QX_hat, p, -10, p, col=cols)
segments(QX_hat, p, QX_hat, 0, col=cols)
mtext( round(QX_hat,2), 1, at=QX_hat, col=cols)

Code

# Multiple Draws
p <- runif(3000)
QX_hat <- quantile(x, p,type=1)
QX_hat[1:5]
## 58.8231606% 73.4648883% 52.9525661% 41.9261910% 19.9049962% 
##       1.775       4.700       0.600      -1.625      -6.025

Using Math.

If you know the distribution function that generates the data, then you can derive the quantile function and do inverse sampling. Here is an in-depth example of the Dagum distribution. The distribution function is \(F(x)=(1+(x/b)^{-a})^{-c}\). For a given \(p=F(x)\), we can then solve for the quantile \(Q(p)=\frac{ b p^{\frac{1}{ac}} }{(1-p^{1/c})^{1/a}}\). Afterwhich, we sample \(p\) from a uniform distribution and then find the associated quantile.

Code
# Theoretical Quantile Function (from VGAM::qdagum)
qdagum <- function(p, scale.b=1, shape1.a, shape2.c) {
  # Quantile function (theoretically derived from the CDF)
  ans <- scale.b * (expm1(-log(p) / shape2.c))^(-1 / shape1.a)
  # Special known cases
  ans[p == 0] <- 0
  ans[p == 1] <- Inf
  # Safety Checks
  ans[p < 0] <- NaN
  ans[p > 1] <- NaN
  if(scale.b <= 0 | shape1.a <= 0 | shape2.c <= 0){ ans <- ans*NaN }
  # Return
  return(ans)
}

# Generate Random Variables (VGAM::rdagum)
rdagum <-function(n, scale.b=1, shape1.a, shape2.c){
    p <- runif(n) # generate random probabilities
    x <- qdagum(p, scale.b=scale.b, shape1.a=shape1.a, shape2.c=shape2.c) #find the inverses
    return(x)
}

# Example
set.seed(123)
X <- rdagum(3000,1,3,1)
X[1:5]
## [1] 0.7390476 1.5499868 0.8845006 1.9616251 2.5091656

  1. Alternatively, you can think of \(F(x)\) as the ECDF for a dataset with an infinite number of observations.↩︎

  2. This is the general formula using CDFs, and you can verify it works in this instance by directly adding the probability of each 2 or 3 event: \(Prob(X_{i} = 2) + Prob(X_{i} = 3) = 1/4 + 1/4 = 2/4\).↩︎

  3. In other classes, you may further distinguish types of random variables based on whether their maximum value is theoretically finite or infinite.↩︎

  4. Drawing random uniform samples with computers is actually quite complex and beyond the scope of this course.↩︎