In the last section we computed a distribution given the data, whereas now we generate data given the distribution.
Random variables are vectors that are generated from a known Cumulative Distribution Function. They are a sample from a potentially infinite population with
A sample space which refers to the set of all possible outcomes.
A probability for each particular set of outcomes, which is the proportion that those outcomes occur in the long run.
There are many probability distributions, and the most common ones are easily accessible. But there are only two basic types of sample spaces: discrete (encompassing cardinal-discrete, factor-ordered, and factor-unordered data) and continuous, which lead to two types of random variables.
4.1 Discrete
The random variable can take one of several values in a set. E.g., any number in \(\{1,2,3,...\}\) or any letter in \(\{A,B,C,...\}\).
Bernoulli.
Think of a Coin Flip: Heads=1 or Tails=0, with Prob. Heads = 1/2. In general, the probability can vary.
\[\begin{eqnarray}
X &\in& \{0,1\} \\
Prob(X=0) &=& p \\
Prob(X=1) &=& 1-p.
\end{eqnarray}\]
Any number between \((\infty,\infty)\), with a bell shaped probabilities. The distribution is complex, and not written here, but we will encounter it again and again.
We might further distinguish types of random variables based on whether their maximum value is theoretically finite or infinite. We will return to the theory behind probability distributions in a later chapter.
4.3 Drawing Samples
Using Computers.
There are several ways to computationally generate random variables from a probability distribution. Perhaps the most common one is ``inverse sampling’’ for continuous random variables.
Continuous random variables have an associated quantile function: \(Q_{X}(p)\), which is the inverse of the CDF: the \(x\) value where \(p\) percent of the data fall below it. (Recall that the median is the value \(x\) where \(50\%\) of the data fall below \(x\), for example.) To generate a random variable, first sample \(p\) from a uniform distribution and then find the associated quantile.
Here is an in-depth example of drawing random variables from the Dagum distribution
Code
# Quantile Function (VGAM::qdagum)qdagum <-function(p, scale=1, shape1.a, shape2.p) {# Quantile function (theoretically derived from the CDF) ans <- scale * (expm1(-log(p) / shape2.p))^(-1/ shape1.a)# Special known cases ans[p ==0] <-0 ans[p ==1] <-Inf# Checks ans[p <0] <-NaN ans[p >1] <-NaNif(scale <=0| shape1.a <=0| shape2.p <=0){ ans <- ans*NaN }# Returnreturn(ans)}# Generate Random Variables (VGAM::rdagum)rdagum <-function(n, scale=1, shape1.a, shape2.p){ p <-runif(n) # generate random quantile probabilities x <-qdagum(p, scale=scale, shape1.a=shape1.a, shape2.p=shape2.p) #find the inversesreturn(x)}# Exampleset.seed(123)x <-rdagum(3000,1,3,1)# Empirical DistributionFx_hat <-ecdf(x)plot(Fx_hat, lwd=2, xlim=c(0,5), main='')# Two Examples of generating a random variablep <-c(.25, .9)cols <-c(2,4)Qx_hat <-quantile(x, p)segments(Qx_hat,p,-10,p, col=cols)segments(Qx_hat,p,Qx_hat,0, col=cols)mtext( round(Qx_hat,2), 1, at=Qx_hat, col=cols)