10 Why be reproducible

Before hopping into reproducible programming, lets think about why. My main sell to you is that it is in your own self-interest.

10.1 An example workflow

Taking First Steps …

Step 1: Some Ideas and Data

\(X_{1} \to Y_{1}\)

You copy some data into a spreadsheet
do some calculations and tables the same spreadsheet
some other analysis from here and there, using this software and that.

Step 2: Persuing the lead for a week or two

you beef up the data you got
add some other types of data
copy in a spreadsheet data, manually aggregate
do some more calculations and tables, same as before

Then, a Little Way Down the Road …

1 month later, someone asks about another factor: \(X_{2}\)

You repeat Step 2 with some data on \(X_{2}\).
The details from your “point and click” method are a bit fuzzy.

It takes a little time, but you successfully redo the analysis.

4 months later, someone asks about another factor: \(X_{3}\to Y_{1}\)

You again repeat Step 2 with some data on \(X_{3}\).
You’re pretty sure
- it’s the latest version of the spreadsheet.
- none of tables your tried messed up the order of the rows or columns.

It takes more time – the data processing was not transparent.

6 months later, you want to explore: \(X_{2} \to Y_{2}\).

You found out Excel had some bugs in it’s statistical calculations (see e.g., https://biostat.app.vumc.org/wiki/pub/Main/TheresaScott/StatsInExcel.TAScot.handout.pdf).

2 years later, you want to replicate: \(\{ X_{1}, X_{2}, X_{3} \} \to Y_{1}\)

A rival has proposed an alternative theory. Their idea doesn’t actually make any sense, but their visuals are better and statistics are more sophisticated.
You don’t even have that computer anymore.
A collaborator who handled the data on \(X_{2}\) has moved on.

10.2 An alternative workflow

Suppose you decided to code what you did beginning with Step 2.

It doesn’t take much time to update or replicate your results.

Your computer runs for 2 hours and reproduces the figures and tables. (You wrote your big calculations to use multiple cores and this saved 6 hours–each time.)
You decided to add some more data, and it adds almost no time.
You see the exact steps you took and found an error (glad you found it before publication!)

Your results are transparent and easier to build on.

You easily see that not much has changed with the new data.
You try out a new plot you found in The Visual Display of Quantitative Information, by Edward Tufte.
- It’s not a standard plot, but google answers most of your questions.
- Tutorials help avoid bad practices, such as plotting 2D data as a 3D object (see e.g., https://clauswilke.com/dataviz/no-3d.html).
You try out an obscure statistical approach that’s hot in your field.
- it doesn’t make the paper, but you have some confidence that candidate issue isn’t a big problem

10.3 R and R-Markdown

We will use R Markdown for reproducible research, which is a good choice:

http://www.r-bloggers.com/the-reproducibility-crisis-in-science-and-prospects-for-r/
http://fmwww.bc.edu/GStat/docs/pointclick.html
https://github.com/qinwf/awesome-R\#reproducible-research
A Guide to Reproducible Code in Ecology and Evolution
https://biostat.app.vumc.org/wiki/pub/Main/TheresaScott/ReproducibleResearch.TAScott.handout.pdf

Note that R and R markdown are both languages: R studio interprets R code to produce statistics, R studio interprets R markdown code to produce pretty documents which contain both writing and statistics. (You should already be a bit familiar with R, but not necessarily R Markdown.) Altogether, your project will use

R is our software
Rstudio is our GUI
R Markdown is our document

Both are good for teaching

Homework reports are the smallest and probably first document you create. We will create little homework reports using R markdown that are almost entirely self-contained (showing both code and output). To do this, you will need to install Pandoc on your computer.

Install any required packages

## Packages for Rmarkdown
install.packages("knitr")
install.packages("rmarkdown")
install.packages("bookdown")

## Other packages used in this primer
install.packages("plotly")
install.packages("sf")

To get started with R Markdown, you can first read and work through https://jadamso.github.io/Rbooks/small-scale-projects.html, and then recreate https://jadamso.github.io/Rbooks/small-scale-projects.html#a-homework-example yourself.