Transformations can stabilize variance, reduce skewness, and make model errors closer to Gaussian.
Perhaps the most common examples are power transformations: \(y= x^\lambda\), which includes \(\sqrt{x}\) and \(x^2\).
Other examples include the exponential transformation: \(y=\exp(x)\) for any \(x\in (-\infty, \infty)\) and logarithmic transformation: \(y=\log(x)\) for any \(x>0\).
The Box–Cox Transform nests many cases. For \(x>0\) and parameter \(\lambda\), \[\begin{eqnarray}
y=\begin{cases}
\dfrac{x^\lambda-1}{\lambda}, & \lambda\neq 0,\\
\log\left(x\right) & \lambda=0.
\end{cases}
\end{eqnarray}\] This function is continuous over \(\lambda\).
par(mfrow=c(1,3))for(lambda inc(-1,0,1)){ Y <-bc_transform(X, lambda)hist(Y, main=bquote(paste(lambda,'=',.(lambda))),border=NA, freq=F)}
Law of the Unconscious Statistician (LOTUS).
As before, we will represent the random variable as \(X_{i}\), which can take on values \(x\) from the sample space. If \(X_{i}\) is a discrete random variable (a random variable with a discrete sample space) and \(g\) is a function, then \(\mathbb E[g(X_{i})] = \sum_x g(x)Prob(X_{i}=x)\).
Tip
Let \(X_{i}\) take values \(\{1,2,3\}\) with \[\begin{eqnarray}
Pr(X_{i}=1)=0.2,\quad Prob(X_{i}=2)=0.5,\quad Prob(X_{i}=3)=0.3.
\end{eqnarray}\] Let \(g(x)=x^2+1\). Then \(g(1)=1^2+1=2\), \(g(2)=2^2+1=5\), \(g(3)=3^2+1=10\).
x <-c(1,2,3)x_probs <-c(0.2,0.5,0.3)g <-function(x) x^2+1sum(g(x) * x_probs) ## [1] 5.9
Code
g <-function(x) x^2+1# A theoretical examplex <-c(1,2,3,4)x_probs <-c(1/4, 1/4, 1/4, 1/4)sum(g(x) * x_probs) ## [1] 8.5# A simulation exampleX <-sample(x, prob=x_probs, size=1000, replace=T)mean(g(X))## [1] 8.241
Tip
If \(X_{i}\) is a continuous random variable (a random variable with a continuous sample space) and \(g\) is a function, then \(\mathbb E[g(X_{i})] = \int_{-\infty}^{\infty} g(x)f(x) dx\).
Code
x <-rexp(5e5, rate =1) # X ~ Exp(1)mean(sqrt(x)) # LOTUS Simulation## [1] 0.8865247sqrt(pi) /2# Exact via LOTUS integral## [1] 0.8862269
Note that you have already seen the special case where \(g(X_{i})=\left(X_{i}-\mathbb{E}[X_{i}]\right)^2\).
Jensen’s inequality.
Concave functions curve inwards, like the inside of a cave. Convex functions curve outward, the opposite of concave.
If \(g\) is a concave function, then \(g(\mathbb E[X_{i}]) \geq \mathbb E[g(X_{i})]\).
To generate a random variable from known distributions, you can use some type of physical machine. E.g., you can roll a fair die to generate Discrete Uniform data or you can roll weighted die to generate Categorical data.
There are also several ways to computationally generate random variables from a probability distribution. Perhaps the most common one is “inverse sampling”. To generate a random variable using inverse sampling, first sample \(p\) from a uniform distribution and then find the associated quantile quantile function \(\hat{F}^{-1}(p)\).1
Using Data.
You can generate a random variable from a known empirical distribution. Inverse sampling randomly selects observations from the dataset with equal probabilities. To implement this, we
order the data and associate each observation with an ECDF value
draw \(p \in [0,1]\) as a uniform random variable
find the associated quantile via the ECDF
Tip
Here is an example of generating random murder rates for US states.
# Generating random variables via inverse ECDFp <-runif(3000) ## Multiple DrawsQX_hat <-quantile(FX_hat, p, type=1)QX_hat[c(1,2,3)]## 83.248211956% 50.070770434% 55.293406080% ## 12.7 7.3 7.9## Can also do directly from the dataQX_hat <-quantile(X, p, type=1)QX_hat[c(1,2,3)]## 83.248211956% 50.070770434% 55.293406080% ## 12.7 7.3 7.9
Using Math.
If you know the distribution function that generates the data, then you can derive the quantile function and do inverse sampling. That is how computers generate random data from a distribution.
Code
# 4 random data points from 3 different distributionsqunif(4)## [1] NaNqexp(4)## [1] NaNqnorm(4)## [1] NaN
Here is an in-depth example of the Dagum distribution. The distribution function is \(F(x)=(1+(x/b)^{-a})^{-c}\). For a given probability \(p\), we can then solve for the quantile as \(F^{-1}(p)=\frac{ b p^{\frac{1}{ac}} }{(1-p^{1/c})^{1/a}}\). Afterwhich, we sample \(p\) from a uniform distribution and then find the associated quantile.
Code
# Theoretical Quantile Function (from VGAM::qdagum)qdagum <-function(p, scale.b=1, shape1.a, shape2.c) {# Quantile function (theoretically derived from the CDF) ans <- scale.b * (expm1(-log(p) / shape2.c))^(-1/ shape1.a)# Special known cases ans[p ==0] <-0 ans[p ==1] <-Inf# Safety Checks ans[p <0] <-NaN ans[p >1] <-NaNif(scale.b <=0| shape1.a <=0| shape2.c <=0){ ans <- ans*NaN }# Returnreturn(ans)}# Generate Random Variables (VGAM::rdagum)rdagum <-function(n, scale.b=1, shape1.a, shape2.c){ p <-runif(n) # generate random probabilities x <-qdagum(p, scale.b=scale.b, shape1.a=shape1.a, shape2.c=shape2.c) #find the inversesreturn(x)}# Exampleset.seed(123)X <-rdagum(3000,1,3,1)X[c(1,2,3)]## [1] 0.7390476 1.5499868 0.8845006
9.3 Set Theory
Let the sample space contain events \(A\) and \(B\), with their probability denoted as \(Prob(A)\) and \(Prob(B)\). Then
the union: \(A\cup B\), refers to either event occurring
the intersection: \(A\cap B\), refers to both events occurring
the events \(A\) and \(B\) are mutually exclusive if and only if \(A\cap B\) is empty
the inclusion–exclusion rule is: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)\)
the events \(A\) and \(B\) are mutually independent if and only if \(Prob(A\cap B)=Prob(A) Prob(B)\)
the conditional probability is \(Prob(A \mid B) = Prob(A \cap B)/ P(B)\)
For example, consider a six sided die where events are the number \(>2\) or the number is even.
The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\)
To check mutually exclusive: notice \(A\cap B=\{4,6\}\) which is not empty. So event \(A\) and event \(B\) are not mutually exclusive
Further assuming the die are fair, each side has an equal \(1/6\) probability. This lets us compute
\(Prob(A)=4/6=2/3\) and \(Prob(B)=3/6=1/2\)
\(A\cup B=\{2,3,4,5,6\}\), which implies \(Prob(A\cup B)=5/6\)
\(A\cap B=\{4,6\}\), which implies \(Prob(A\cap B)=2/6=1/3\)
To check independence: notice \(Prob(A\cap B)=1/3 = (2/3) (1/2) = Prob(A)Prob(B)\). So event \(A\) and event \(B\) are mutually independent
Finally, the conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows even number}) = Prob(A \mid B) = \frac{Prob(A \cap B)}{P(B)} = \frac{1/3}{1/2} = 2/3\)
Consider a six sided die with sample space \(\{1,2,3,4,5,6\}\) where each outcome is each equally likely. What is \(Prob(\text{die is odd or }<5)\)? Denote the odd events as \(A=\{1,3,5\}\) and the less than five events as \(B=\{1,2,3,4\}\). Then
\(A\cup B=\{1,3\}\), which implies \(Prob(A\cap B)=2/6=1/3\).
\(Prob(A)=3/6=1/2\) and \(Prob(B)=4/6=2/3\).
By inclusion–exclusion: \(Prob(A\cup B)=Prob(A)+Prob(B)-Prob(A\cap B)=1/2+2/3-1/3=5/6\).
Now find \(Prob(\text{die is even and }<5)\). Verify your answer with a computer simulation.
9.3.0.1 **Law of Total Probability. {-}
This law states that \(Prob(A)=Prob(A \mid B) P(B) + Prob(A \mid \not B) P(\not B)\), where \(\not B\) is the event “not \(B\)”.
For example, consider a six sided die where events are the number \(>2\) or the number is even. - The sets are \(A=\{3,4,5,6\}\) and \(B=\{2,4,6\}\) and \(\not B = \{1,3,5\}\) From before, we know that \(Prob(A)=2/3\), \(Prob(B)=1/2\), and \(Prob(A \mid B) = 2/3\). Then see that \(Prob(A \mid B) Prob(B) = \frac{2}{3} \frac{1}{2} = 2/6\)
We can also directly compute \(Prob(\not B)=1/2\), or infer it from \(Prob(\not B)=1-Prob(B)\). The conditional probability of \(Prob(\text{die is} >2 \mid \text{die shows odd number}) = Prob(A \mid \not B) = \frac{Prob(A \cap \not B)}{P(\not B)} = 2/3\).
Then we can see that \(Prob(A \mid B) Prob(B) + Prob(A \mid \not B) Prob(\not B) = 2/6 + 2/6 = 2/3 = Prob(A)\).
9.4 Factorial Distributions
Binomial.
The sum of \(n\) Bernoulli trials (number of successes)
Discrete, support \(\{0,1,\ldots,n\}\)
Probability Mass Function: \(Prob(X_{i}=x)=\binom{n}{x}p^{x}(1-p)^{n-x}\)
Suppose that employees at a company are \(70%\) female and \(30%\) male. If we select a random sample of eight employees, what is the probability that more than \(2\) in the sample are female?
Tip
Show that \(\mathbb{E}[X_{i}]=np\) and \(\mathbb{V}[X_{i}]=np(1-p)\). Start with the case \(n=1\), then the case \(n=2\), then case \(n=3\), then the general case of any \(n\).
The Binomial Limit Theorem (de Moivre–Laplace theorem) says that as \(n\) grows large, with \(p \in (0,1)\) staying fixed, the Binomial distribution is approximately normal with mean \(np\) and variance \(np(1-p)\)
Tip
The unemployment rate is \(10%\). Suppose that \(100\) employable people are selected randomly. What is the probability that this sample contains between \(9\) and \(12\) unemployed people. Use the normal approximation to binomial probabilities.
Parameters are \(\mu \approx n p = 10\)\(\sigma^2= n p (1−p) = 100 (0.1)(0.9)=9\)\(\sigma=\sqrt{9}=3\)).
Poisson.
The number of events in a fixed interval
Discrete, support \(\{0,1,2,\ldots\}\)
Probability Mass Function: \(Prob(X_{i}=x)=e^{-\lambda}\lambda^x/x!\)
Show that \(\mathbb{E}[X_{i}] = \mathbb{V}[X_{i}]= \lambda\).
Irwin–Hall.
The sum of \(n\) i.i.d. \(\text{Uniform}(0,1)\).
Continuous, support \([0,n]\)
Probability Density Function: \(f(x) = \dfrac{1}{(n-1)!}
\displaystyle\sum_{k=0}^{\lfloor x \rfloor} (-1)^k
\binom{n}{k} (x - k)^{n-1}\) for \(x \in [0,n]\), and \(0\) otherwise
A common use case: representing the sum of many small independent financial shocks
Code
## Minimal example (base R)# Irwin–Hall PDF functiondirwinhall <-function(x, n) { f_x <-vector(length=length(x))for(i inseq(x)) { xx <- x[i]if(xx <0| xx > n){ f_x[i] <-0 } else { k <-seq(0, floor(xx)) f_k <-sum((-1)^k*choose(n, k)*(xx-k)^(n-1))/factorial(n-1) f_x[i] <- f_k } }return(f_x)}# Parametersn <-2x <-seq(0, n, length.out =500)# Compute and plot PDFf_x <-dirwinhall(x, n)plot(x, f_x, type="l", col="blue",main='', xlab ="x", ylab ="f(x)")title(bquote(paste('IrwinHall(',.(n), ')')))
See also the Bates distribution, which is for the mean of \(n\) i.i.d. \(\text{Uniform}(0,1)\) random variables. It essentially rescales the Irwin-Hall distribution to range between \(0\) and \(1\).
Beta.
The sample space is any number on the unit interval, \(X_{i} \in [0,1]\), but with non-uniform probabilities.
The Beta distribution is mathematically complicated to write, and so we omit it. However, we can find the probability graphically using either the probability density function or cumulative distribution function.
Tip
Suppose \(X_{i}\) is a random variable with a beta distribution. Intuitively depict \(Prob(X_{i} \in [0.2, 0.8])\) by drawing an area under the density function. Numerically estimate that same probability using the CDF.
This distribution is often used, as the probability density function has two parameters that allow it to take many different shapes.
Tip
For each example below, intuitively depict \(Prob(X_{i} \leq 0.5)\) using the PDF. Repeat the exercise using a CDF instead of a PDF to calculate a numerical value.
Expansion “packages” allow for more functionality.
Most packages can be easily installed from the internet via CRAN.
Code
# common packages for tables/figuresinstall.packages("plotly")install.packages("reactable")install.packages("stargazer")# common packages for data/teachinginstall.packages("AER")install.packages("Ecdat")install.packages('wooldridge')# other packages for statistics and data handlinginstall.packages("extraDistr")install.packages("twosamples")install.packages("data.table")
You only need to install packages once. But you need to load the package via library every time you want the extended functionality. For example, to generate exotic probability distributions
Code
# install.packages("extraDistr"), once everlibrary(extraDistr) # every time you need itpar(mfrow=c(1,2))for(p inc(-.5,0)){ x <-rgev(2000, mu=0, sigma=1, xi=p)hist(x, breaks=50, border=NA, main=NA, freq=F)}title('GEV densities', outer=T, line=-1)
While additional packages can make your code faster, they also create dependencies that can lead to problems. So learn base R well before becoming dependent on other packages