12 Medium Scale Projects
As you scale up to a medium sized project, however, you will have to be more organized.
Medium sized projects should have their own Project
folder on your computer with files, subdirectories with files, and subsubdirectories with files. It should look like this
Project
└── README.txt
└── /Code
└── MAKEFILE.R
└── RBLOCK_001_DataClean.R
└── RBLOCK_002_Figures.R
└── RBLOCK_003_ModelsTests.R
└── RBLOCK_004_Robust.R
└── /Logs
└── MAKEFILE.Rout
└── /Data
└── /Raw
└── Source1.csv
└── Source2.shp
└── Source3.txt
└── /Clean
└── AllDatasets.Rdata
└── MainDataset1.Rds
└── MainDataset2.csv
└── /Output
└── MainFigure.pdf
└── AppendixFigure.pdf
└── MainTable.tex
└── AppendixTable.tex
└── /Writing
└── /TermPaper
└── TermPaper.tex
└── TermPaper.bib
└── TermPaper.pdf
└── /Slides
└── Slides.Rmd
└── Slides.html
└── Slides.pdf
└── /Poster
└── Poster.Rmd
└── Poster.html
└── Poster.pdf
└── /Proposal
└── Proposal.Rmd
└── Proposal.html
└── Proposal.pdf
12.1 MAKEFILE
There are two main meta-files
README.txt
overviews the project structure and what the codes are doingMAKEFILE
explicitly describes and executes all codes.
If all code is written with the same program, the makefile can be written in that programs code: MAKEFILE.R
, which looks like
### Project Structure
<- path.expand("~/Desktop/Project/")
home_dir
<- paste0(home_dir, "Data/")
data_dir <- paste0(data_dir, "Raw/")
data_dir_r <- paste0(data_dir, "Clean/")
data_dir_c
<- paste0(hdir, "Output/")
out_dir
<- paste0(hdir, "Code/")
code_dir
### Execute Codes
### libraries are loaded within each RBLOCK
set.wd( code_dir )
source( "RBLOCK_001_DataClean.R" )
source( "RBLOCK_002_Figures.R" )
source( "RBLOCK_003_ModelsTests.R" )
source( "RBLOCK_004_Robust.R" )
If some folders or files are not created, you can do this within R
# create directory called 'Data'
dir.create('Data')
# list the files and directories
list.files(recursive=TRUE, include.dirs=TRUE)
12.2 Logging/Sinking
You can then execute the makefile within R and log the output. Either by
- Inserting some code that logs/sinks the output
### Project Structure
<- path.expand("~/Desktop/Project/")
home_dir
<- paste0(home_dir, "Data/")
data_dir <- paste0(data_dir, "Raw/")
data_dir_r <- paste0(data_dir, "Clean/")
data_dir_c
<- paste0(hdir, "Output/")
out_dir
<- paste0(hdir, "Code/")
code_dir
### Log Output
set.wd( code_dir )
sink("MAKEFILE.Rout", append=TRUE, split=TRUE)
### Execute Codes
source( "RBLOCK_001_DataClean.R" )
source( "RBLOCK_002_Figures.R" )
source( "RBLOCK_003_ModelsTests.R" )
source( "RBLOCK_004_Robust.R" )
### Stop Logging Output
sink()
- Starting a session that logs/sinks you sourcing the makefile
sink("MAKEFILE.Rout", append=TRUE, split=TRUE)
source("MAKEFILE.R")
sink()
- Execute the makefile via the commandline
R CMD BATCH MAKEFILE.R MAKEFILE.Rout
12.3 Final Step
Zip your project into a single file that is easy for others to identify: Class_Project_LASTNAME_FIRSTNAME.zip
Your code should be readable and error free. For code writing guides, see
- https://google.github.io/styleguide/Rguide.html
- https://style.tidyverse.org/
- https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/codestyle.html
- http://adv-r.had.co.nz/Style.html
- https://www.burns-stat.com/pages/Tutor/R_inferno.pdf
For organization guidelines, see
- https://guides.lib.berkeley.edu/c.php?g=652220&p=4575532
- https://kbroman.org/steps2rr/pages/organize.html
- https://drivendata.github.io/cookiecutter-data-science/
- https://ecorepsci.github.io/reproducible-science/project-organization.html
For additional logging capabilities, see https://cran.r-project.org/web/packages/logr/
For very large projects, there are many more tools available at https://cran.r-project.org/web/views/ReproducibleResearch.html