In the last section we computed a distribution given the data, whereas now we generate individual data points given the distribution.
Random variables are vectors whose values occur according to a frequency distribution. As such, random variables have a
sample space which refers to the set of all possible outcomes, and
probability for each particular set of outcomes, which is the proportion that those outcomes occur in the long run.
We think of each observation \(\hat{X}_{i}\) before it is actually observed, potentially taking on specific values \(x\) from the sample space with known probabilities. For example, we consider flipping a coin before knowing whether it lands on heads or tails. We denote the random variable, in this case the unflipped coin, as \(X_{i}\).
There are two basic types of sample spaces: discrete (encompassing cardinal-discrete, factor-ordered, and factor-unordered data) and continuous. This leads to two types of random variables: discrete and continuous. However, each type has many different probability distributions.
Probability.
The most common random variables are easily accessible and can be described using the Cumulative Distribution Function (CDF) \[\begin{eqnarray}
F(x) &=& Prob(X_{i} \leq x).
\end{eqnarray}\] Note that this is just like the Empirical Cumulative Distribution Function (ECDF), \(\widehat{F}(x)\), except that it is now theoretically known. You can think of \(F(x)\) as the ECDF for a dataset with an infinite number of observations. Equivalently, the ECDF is an empirical version of the CDF that is applied to observed data.
After introducing different random variables, we will also cover some basic implications of their CDF. Intuitively, probabilities must sum up to one. So we can compute \(Prob(X_{i} > x) = 1- F(x)\). We also have two “in” and “out” probabilities.
The probability of \(X_{i}\leq b\) and \(X_{i}\geq a\) can be written in terms of falling into a range \(Prob(X_{i} \in [a,b])=Prob(a \leq X_{i} \leq b) = F(b) - F(a)\).
The opposite probability of \(X_{i} > b\) or \(X_{i} < a\) is \(Prob(X_{i} < a \text{ or } X_{i} > b) = F(a) + [1- F(b)]\). Notice that this opposite probability \(F(a) + [1- F(b)] =1 - [F(b) - F(a)]\), so that \(Prob(X_{i} \text{ out of } [a,b]) = 1 - Prob( X_{i} \in [a,b])\)
4.1 Discrete
A discrete random variable can take one of several values in a set. E.g., any number in \(\{1,2,3,...\}\) or any letter in \(\{A,B,C,...\}\). Theoretical proportions are referred to as a probability mass function, which can be thought of as a proportions bar plot for an infinitely large dataset. Equivalently, the bar plot is an empirical version of the probability mass function that is applied to observed data.
Bernoulli.
Think of a Coin Flip: Heads or Tails with equal probability. In general, a Bernoulli random variable denotes Heads as the event \(X_{i}=1\) and Tails as the event \(X_{i}=0\), and allows the probability of Heads to vary. \[\begin{eqnarray}
X_{i} &\in& \{0,1\} \\
Prob(X_{i} =0) &=& 1-p \\
Prob(X_{i} =1) &=& p \\
F(x) &=& \begin{cases}
0 & x<0 \\
1-p & x \in [0,1) \\
1 & x\geq 1
\end{cases}
\end{eqnarray}\]
Here is an example of the Bernoulli distribution. While you might get all heads (or all tails) in the first few coin flips, the ratios level out to their theoretical values after many flips.
Discrete numbers with equal probability, such as a die with \(K\) sides. \[\begin{eqnarray}
X_{i} &\in& \{1,...K\} \\
Prob(X_{i} =1) &=& Prob(X_{i} =2) = ... = 1/K\\
F(x) &=& \begin{cases}
0 & x<1 \\
1/K & x \in [1,2) \\
2/K & x \in [2,3) \\
\vdots & \\
1 & x\geq K
\end{cases}
\end{eqnarray}\]
Note
Here is an example with \(K=4\). E.g., rolling a four-sided die.
The probability of a value smaller than or equal to \(3\) is \(Prob(X_{i} \leq 3)=1/4 + 1/4 + 1/4 = 3/4\).
The probability of a value larger than \(3\) is \(Prob(X_{i} > 3) = 1-Prob(X_{i} \leq 3)=1/4\).
The probability of a value \(>\) 1 and \(\leq 3\) is \(Prob(1 < X_{i} \leq 3) = Prob(X_{i} \leq 3) - \left[ 1- Prob(X_{i} \leq 1) \right] = 3/4 - 1/4 = 2/4\).1
The probability of a value \(\leq\) 1 or \(> 3\) is \(Prob(X_{i} \leq 1 \text{ or } X_{i} > 3) = Prob(X_{i} \leq 1) + \left[ 1- Prob(X_{i} \leq 3) \right] = 1/4 + [1 - 3/4]=2/4\).
We can also replace numbers with letters \((A,...Z)\) or names \((John, Jamie, ...)\) although we must be careful with the CDF when there is no longer a natural ordering. Here is an empirical example with three outcomes
Code
x <-c('A', 'B', 'C')x_probs <-c(3/10, 1/10, 6/10)sum(x_probs)## [1] 1X2 <-sample(x, 2000, prob=x_probs, replace=T) # sample of 2000# Plot Long run proportionsproportions <-table(X2)/length(X2)plot(proportions, col=grey(0,.5),xlab='Outcome', ylab='Proportion', main=NA)points(x_probs, pch=16, col='blue') # Theoretical values
Suppose there is an experiment with three possible outcomes, \(\{A, B, C\}\). It was repeated \(50\) times and discovered that \(A\) occurred \(10\) times, \(B\) occurred \(13\) times, and \(C\) occurred \(27\) times. The estimated probability of each outcome is found via the bar plot \(\hat{p}_{A} = 10/50\), \(\hat{p}_{B} = 13/50\), \(\hat{p}_{A} = 27/50\). We can also estimate the “in” probabilities as \(\widehat{Prob}(A \text{ or } B)=10/50+13/50=23/50\) and \(\widehat{Prob}(B \text{ or } C)=13/50+27/50=40/50\), as well as the “out” probability as \(\widehat{Prob}(A \text{ or } C)=13/50+27/50=37/50\).
Suppose there are three possible outcomes of an experiment, \(\{\text{my car dies}, \text{it rains next Tuesday},\text{a cat is born}\}\), which have corresponding probabilities \(\{3/10, 1/10, 6/10 \}\) that are known theoretically. Compute the probability that my car dies or a cat is born.
4.2 Continuous
A continuous random variable can take one value out of an uncountably infinite number. E.g., any number between \(0\) and \(1\) with any number of decimal points. With a continuous random variable, the probability of any individual point is zero, so we describe these variables with the cumulative distribution function (CDF), \(F\), or the probability density function (PDF), \(f\). Just as \(F\) can be thought of as the ECDF, \(\widehat{F}\), with an infinite amount of data, \(f\) can be thought of as a histogram, \(\widehat{f}\), with an infinite amount of data. Equivalently, the histogram is an empirical version of the PDF that is applied to observed data.
Often, the PDF helps you intuitively understand a random variable whereas the CDF helps you calculate numerical values. This is because probabilities are depicted as areas in the PDF and the CDF accumulates those areas: \(F(x)\) equals the area under the PDF from \(-\infty\) to \(x\). For example, \(Prob(X_{i} \leq 1)\) is depicted by the PDF as the area under \(f(x)\) from the lowest possible value until \(x=1\), which is numerically calculated simply as \(F(1)\).
Continuous Uniform.
Any number on a unit interval allowing for any number of decimal points, with every interval of the same size having the same probability. \[\begin{eqnarray}
X_{i} &\in& [0,1] \\
f(x) &=& \begin{cases}
1 & x \in [0,1] \\
0 & \text{Otherwise}
\end{cases}\\
F(x) &=& \begin{cases}
0 & x < 0 \\
x & x \in [0,1] \\
1 & x > 1.
\end{cases}
\end{eqnarray}\]
Note
The probability of a value being exactly \(0.25\) is \(Prob(X_{i} =0.25)=0\).
The probability of a value smaller than \(0.25\) is \(F(0.25)=0.25\).
The probability of a value larger than \(0.25\) is \(1-F(0.25)=0.75\).
The probability of a value in \((0.25,0.75]\) is \(Prob(0.25 < X_{i} \leq 0.75) = Prob(X_{i} \leq 0.75) - \left[ 1- Prob(X_{i} \leq 0.25) \right] = 0.75 - 0.25 = 0.5\).
The probability of a value in \((0.2,0.7]\) is \(Prob(0.2 < X_{i} \leq 0.7) = Prob(X_{i} \leq 0.7) - \left[ 1- Prob(X_{i} \leq 0.2) \right] = 0.7 - 0.2 = 0.5\).
The probability of a value outside of \((0.2,0.7]\) is \(Prob(X_{i} \leq 0.2 \text{ or } x > 0.7) = 0.2 + [1-0.7]=0.5\). Alternatively, you can compute \(1- Prob(0.2 < X_{i} \leq 0.7)=1-0.5=0.5\).
# CDF example 1P_low <-punif(0.25)P_low## [1] 0.25# Uncomment to show via PDF# x_low <- seq(0,0.25,by=.001)# fx_low <- dunif(x_low)# polygon( c(x_low, rev(x_low)), c(fx_low,fx_low*0),# col=rgb(0,0,1,.25), border=NA)# CDF example 2P_high <-1-punif(0.25)P_high## [1] 0.75# Uncomment to show via PDF# x_high <- seq(0.25,1,by=.001)# fx_high <- dunif(x_high)# polygon( c(x_high, rev(x_high)), c(fx_high,fx_high*0),# col=rgb(0,0,1,.25), border=NA)# CDF example 3P_mid <-punif(0.75) -punif(0.25)P_mid## [1] 0.5# Uncomment to show via PDF# x_mid <- seq(0.25,0.75,by=.001)# fx_mid <- dunif(x_mid)# polygon( c(x_mid, rev(x_mid)), c(fx_mid,fx_mid*0),# col=rgb(0,0,1,.25), border=NA)
Note that the Continuous Uniform distribution generalizes to an arbitrary interval, \(X_{i} \in [a,b]\). In this case, \(f(x)=1/[b-a]\) if \(x \in [a,b]\) and \(F(x)=[x-a]/[b-a]\) if \(x \in [a,b]\).
Note
Suppose \(X_{i}\) is a random variable continuously distributed over \(a=-2\) and \(b=2\). What is the probability of a value larger than \(0.25\)? First use the computer to suggest an answer: simulate \(1000\) draws and then make a histogram and an ECDF. Then find the answer mathematically using the CDF. Finally, verify the answer is intuitively correct in a figure of the PDF. You should draw by hand both the CDF and the PDF with correct axes labels and marking clearly the probability of a value larger than \(0.25\).
Suppose the flight time between Calgary and Kamloops is Uniformly distributed between \(68\) and \(78\) minutes. According to Air Canada the flight takes \(70\) minutes. What is the probability that the flight will be late?
Beta.
The sample space is any number on the unit interval, \(X_{i} \in [0,1]\), but with non-uniform probabilities.
The Beta distribution is mathematically complicated to write, and so we omit it. However, we can find the probability graphically using either the probability density function or cumulative distribution function.
Tip
Suppose \(X_{i}\) is a random variable with a beta distribution. Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.
This distribution is often used, as the probability density function has two parameters that allow it to take many different shapes.
Tip
For each example below, intuitively depict \(Prob(X_{i} \leq 0.5)\) using the PDF. Repeat the exercise using a CDF instead of a PDF to calculate a numerical value.
The sample space is any positive number.2 An Exponential random variable has a single parameter, \(\lambda>0\), that governs its shape \[\begin{eqnarray}
X_{i} &\in& [0,\infty) \\
f(x) &=& \lambda exp\left\{ -\lambda x \right\} \\
F(x) &=& \begin{cases}
0 & x < 0 \\
1- exp\left\{ -\lambda x \right\} & x \geq 0.
\end{cases}
\end{eqnarray}\]
Suppose the lifetime of a battery is an exponential random variable with \(\lambda=1/50\). Using the computer, find the probability that the lifetime is \(< 10\) hours. Find the probability that the lifetime is \(\geq 100\) hours. Use the computer to find the probability that the lifetime is between \(10\) and \(100\) hours.
Code
pexp(10, 1/50)## [1] 0.1812692
Normal (Gaussian).
This distribution is for any number on the real line, with bell shaped probabilities. The Normal distribution is mathematically complex and sometimes called the Gaussian distribution. We call it “Normal” because we will encounter it again and again and again. The probability density function \(f\) has two parameters \(\mu \in (\infty,\infty)\) and \(\sigma > 0\). \[\begin{eqnarray}
X_{i} &\in& (\infty,\infty) \\
f(x) &=& \frac{1}{\sqrt{2\pi \sigma^2}} exp\left\{ \frac{-(x-\mu)^2}{2\sigma^2} \right\}
\end{eqnarray}\]
Suppose \(X_{i}\) is a random variable with a normal distribution with \(\mu=0\) and \(\sigma=1\). Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.
Suppose that your health status is a normally distributed random variable with \(\mu=2\) and \(\sigma=3\). If we randomly sample one person, what is the probability there health status is higher than \(4\)?
Code
# Start with a simulation of 1000 people to build intuitionX <-rnorm(1000,2,3)hist(X, freq=F, border=NA, main=NA)
Code
sum(X>4)/1000## [1] 0.26# Do an exact calculation1-pnorm(4,2,3)## [1] 0.2524925
Suppose scores in math class are approximately normally distributed with \(\mu=50, \sigma=1\). If you selected one student randomly, what is the probability their score is higher than \(90\). Is \(Prob(X_{i}\geq 90)\) higher if \(\mu=25, \sigma=2\)? What about \(\mu=10, \sigma=5\)?
4.3 Further Reading
Note that many random variables are related to each other
Also note that numbers randomly generated on your computer cannot be truly random, they are “Pseudorandom”.
This is the general formula using CDFs, and you can verify it works in this instance by directly adding the probability of each 2 or 3 event: \(Prob(X_{i} = 2) + Prob(X_{i} = 3) = 1/4 + 1/4 = 2/4\).↩︎
In other classes, you may further distinguish types of random variables based on whether their maximum value is theoretically finite or infinite.↩︎