10 Why be reproducible
Before hopping into reproducible programming, lets think about why. My main sell to you is that it is in your own self-interest.
10.1 An example workflow
Taking First Steps …
Step 1: Some Ideas and Data
\(X_{1} \to Y_{1}\)
- You copy some data into a spreadsheet
- do some calculations and tables the same spreadsheet
- some other analysis from here and there, using this software and that.
Step 2: Persuing the lead for a week or two
- you beef up the data you got
- add some other types of data
- copy in a spreadsheet data, manually aggregate
- do some more calculations and tables, same as before
Then, a Little Way Down the Road …
1 month later, someone asks about another factor: \(X_{2}\)
- You repeat Step 2 with some data on \(X_{2}\).
- The details from your “point and click” method are a bit fuzzy.
It takes a little time, but you successfully redo the analysis.
4 months later, someone asks about another factor: \(X_{3}\to Y_{1}\)
- You again repeat Step 2 with some data on \(X_{3}\).
- You’re pretty sure
- it’s the latest version of the spreadsheet.
- none of tables your tried messed up the order of the rows or columns.
It takes more time – the data processing was not transparent.
6 months later, you want to explore: \(X_{2} \to Y_{2}\).
- You found out Excel had some bugs in it’s statistical calculations (see e.g., https://biostat.app.vumc.org/wiki/pub/Main/TheresaScott/StatsInExcel.TAScot.handout.pdf).
2 years later, you want to replicate: \(\{ X_{1}, X_{2}, X_{3} \} \to Y_{1}\)
- A rival has proposed an alternative theory. Their idea doesn’t actually make any sense, but their visuals are better and statistics are more sophisticated.
- You don’t even have that computer anymore.
- A collaborator who handled the data on \(X_{2}\) has moved on.
10.2 An alternative workflow
Suppose you decided to code what you did beginning with Step 2.
It doesn’t take much time to update or replicate your results.
- Your computer runs for 2 hours and reproduces the figures and tables. (You wrote your big calculations to use multiple cores and this saved 6 hours–each time.)
- You decided to add some more data, and it adds almost no time.
- You see the exact steps you took and found an error (glad you found it before publication!)
Your results are transparent and easier to build on.
- You easily see that not much has changed with the new data.
- You try out a new plot you found in The Visual Display of Quantitative Information, by Edward Tufte.
- It’s not a standard plot, but google answers most of your questions.
- Tutorials help avoid bad practices, such as plotting 2D data as a 3D object (see e.g., https://clauswilke.com/dataviz/no-3d.html).
- You try out an obscure statistical approach that’s hot in your field.
- it doesn’t make the paper, but you have some confidence that candidate issue isn’t a big problem
10.3 R and R-Markdown
We will use R Markdown for reproducible research, which is a good choice:
- http://www.r-bloggers.com/the-reproducibility-crisis-in-science-and-prospects-for-r/
- http://fmwww.bc.edu/GStat/docs/pointclick.html
- https://github.com/qinwf/awesome-R\#reproducible-research
- A Guide to Reproducible Code in Ecology and Evolution
- https://biostat.app.vumc.org/wiki/pub/Main/TheresaScott/ReproducibleResearch.TAScott.handout.pdf
Note that R and R markdown are both languages: R studio interprets R code to produce statistics, R studio interprets R markdown code to produce pretty documents which contain both writing and statistics. (You should already be a bit familiar with R, but not necessarily R Markdown.) Altogether, your project will use
- R is our software
- Rstudio is our GUI
- R Markdown is our document
Both are good for teaching
Homework reports are the smallest and probably first document you create. We will create little homework reports using R markdown that are almost entirely self-contained (showing both code and output). To do this, you will need to install Pandoc on your computer.
Install any required packages
## Packages for Rmarkdown
install.packages("knitr")
install.packages("rmarkdown")
install.packages("bookdown")
## Other packages used in this primer
install.packages("plotly")
install.packages("sf")
To get started with R Markdown, you can first read and work through https://jadamso.github.io/Rbooks/small-scale-projects.html, and then recreate https://jadamso.github.io/Rbooks/small-scale-projects.html#a-homework-example yourself.