8 Reporting


8.1 R and R-Markdown

We will use R Markdown for communicating results to each other. Note that R and R Markdown are both languages. R studio interprets R code make statistical computations and interprets R Markdown code to produce pretty documents that contain both writing and statistics. Altogether, your project will use

  • R: does statistical computations
  • R Markdown: formats statistical computations for sharing
  • Rstudio: graphical user interface that allows you to easily use both R and R Markdown.

Homework reports are probably the smallest document you can create. These little reports are almost entirely self-contained (showing both code and output). To make them, you will need to

First install Pandoc on your computer.

Then install any required packages

Code
# Packages for Rmarkdown
install.packages("knitr")
install.packages("rmarkdown")

# Other packages frequently used
#install.packages("plotly") #for interactive plots
#install.packages("sf") #for spatial data

8.2 Simple Reports

We will create reproducible reports via R Markdown.

Example 1: Data Scientism.

See DataScientism.html and then create it by

  • Clicking the “Code” button in the top right and then “Download Rmd”
  • Open with Rstudio
  • Change the name and title to your own, make other edits
  • Then point-and-click “knit”

Alternatively,

  • Download the source file from DataScientism.Rmd
  • Change the name and title to your own, make other edits
  • Use the console to run
Code
rmarkdown::render('DataScientism.Rmd')

Example 2: Homework Assignment. Below is a template of what homework questions (and answers) look like. Create a new .Rmd file from scratch and produce a .html file that looks similar to this:

Problem: Simulate 100 random observations of the form \(y=x\beta+\epsilon\) and plot the relationship. Plot and explore the data interactively via plotly, https://plotly.com/r/line-and-scatter/. Then play around with different styles, https://www.r-graph-gallery.com/13-scatter-plot.html, to best express your point.

Solution: I simulate \(400\) observations for \(\epsilon \sim 2\times N(0,1)\) and \(\beta=4\), as seen in this single chunk. Notice an upward trend.

Code
# Simulation
n <- 100
E <- rnorm(n)
X <- seq(n)
Y <- 4*X + 2*E
# Plot
library(plotly)
dat <- data.frame(X=X,Y=Y)
plot_ly( data=dat, x=~X, y=~Y)
Code

# To Do:
# 1. Fit a regression line
# 2. Color points by their residual value