16 Large Projects
As you scale up a project, then you will have to be more organized.
16.1 Scripting
Basics.
Save the following code as MyFirstScript.R
Code
Restart Rstudio.9
Replicate in another tab via
Note that you may first need to setwd()
so your computer knows where you saved your code.10
After you get this working:
- add a the line
print(sum_squared(y, y))
to the bottom ofMyFirstCode.R
. - apply the function to a vectors specified outside of that script
- record the session information
Code
Note that you can open a new terminal in RStudio in the top bar by clicking ‘tools > terminal > new terminal’
Logging/Sinking.
When executing the makefile, you can also log the output in three different ways:
- Inserting some code into the makefile that “sinks” the output
Code
# Project Structure
home_dir <- path.expand("~/Desktop/Project/")
data_dir_r <- paste0(data_dir, "Data/Raw/")
data_dir_c <- paste0(data_dir, "Data/Clean/")
out_dir <- paste0(hdir, "Output/")
code_dir <- paste0(hdir, "Code/")
# Log Output
set.wd( code_dir )
sink("MAKEFILE.Rout", append=TRUE, split=TRUE)
# Execute Codes
source( "RBLOCK_001_DataClean.R" )
source( "RBLOCK_002_Figures.R" )
source( "RBLOCK_003_ModelsTests.R" )
source( "RBLOCK_004_Robust.R" )
sessionInfo()
# Stop Logging Output
sink()
- Starting a session that “sinks” the makefile
- Execute the makefile via the commandline
16.2 Organizing
Project Structure.
Large sized projects should have their own Project
folder on your computer with files, subdirectories with files, and subsubdirectories with files. It should look like this
Project
└── README.txt
└── /Code
└── MAKEFILE.R
└── RBLOCK_001_DataClean.R
└── RBLOCK_002_Figures.R
└── RBLOCK_003_ModelsTests.R
└── RBLOCK_004_Robust.R
└── /Logs
└── MAKEFILE.Rout
└── /Data
└── /Raw
└── Source1.csv
└── Source2.shp
└── Source3.txt
└── /Clean
└── AllDatasets.Rdata
└── MainDataset1.Rds
└── MainDataset2.csv
└── /Output
└── MainFigure.pdf
└── AppendixFigure.pdf
└── MainTable.tex
└── AppendixTable.tex
└── /Writing
└── /TermPaper
└── TermPaper.tex
└── TermPaper.bib
└── TermPaper.pdf
└── /Slides
└── Slides.Rmd
└── Slides.html
└── Slides.pdf
└── /Poster
└── Poster.Rmd
└── Poster.html
└── Poster.pdf
└── /Proposal
└── Proposal.Rmd
└── Proposal.html
└── Proposal.pdf
There are two main meta-files
README.txt
overviews the project structure and what the codes are doingMAKEFILE
explicitly describes and executes all codes (and typically logs the output).
Class Projects. Zip your project into a single file that is easy for others to identify: Class_Project_LASTNAME_FIRSTNAME.zip
MAKEFILE.
If all code is written with the same program (such as R) the makefile can be written in a single language. For us, this looks like
Code
# Project Structure
home_dir <- path.expand("~/Desktop/Project/")
data_dir_r <- paste0(data_dir, "Data/Raw/")
data_dir_c <- paste0(data_dir, "Data/Clean/")
out_dir <- paste0(hdir, "Output/")
code_dir <- paste0(hdir, "Code/")
# Execute Codes
# libraries are loaded within each RBLOCK
setwd( code_dir )
source( "RBLOCK_001_DataClean.R" )
source( "RBLOCK_002_Figures.R" )
source( "RBLOCK_003_ModelsTests.R" )
source( "RBLOCK_004_Robust.R" )
# Report all information relevant for replication
sessionInfo()
Notice there is a lot of documentation # like this
, which is crucial for large projects. Also notice that anyone should be able to replicate the entire project by downloading a zip file and simply changing home_dir
.
If some folders or files need to be created, you can do this within R
16.3 Posters and Slides
Posters and presentations are another important type of scientific document. R markdown is good at creating both of these, and actually very good with some additional packages. So we will also use flexdashboard for posters and beamer for presentations.
Poster. See DataScientism_Poster.html and recreate from the source file DataScientism_Poster.Rmd. Simply change the name to your own, and knit the document.
Slides. See DataScientism_Slides.pdf and recreate from the source file DataScientism_Slides.Rmd.
Since beamer is a pdf output, you will need to install Latex. Alternatively, you can install a lightweight version TinyTex from within R
If you cannot install Latex, then you must specify a different output. For example, change output: beamer_presentation
to output: ioslides_presentation
on line 6 of the source file.
16.4 Applications
Shiny is an R package to build web applications.
Shiny Flexdashboards are nicely formatted Shiny Apps. While it is possible to use Shiny without the Flexdashboard formatting, I think it is easier to remember
.R
files are codes for statistical analysis.Rmd
files are for communicating: reports, slides, posters, and apps
Example: Histogram.
Download the source file TrialApp1_Histogram_Dashboard.Rmd
and open it with rstudio
. Then run it with
Within the app, experiment with how larger sample sizes change the distribution.
Edit the app to let the user specify the number of breaks in the histogram.
If you are having difficulty, you can try working first with the barebones shiny code. To do this, download TrialApp0_Histogram.Rmd and edit it in Rstudio. You can run the code with rmarkdown::run('TrialApp0_Histogram.Rmd')
.
16.5 Further Reading
Your code should be readable and error free. For code writing guides, see
- https://google.github.io/styleguide/Rguide.html
- https://style.tidyverse.org/
- https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/codestyle.html
- http://adv-r.had.co.nz/Style.html
- https://www.burns-stat.com/pages/Tutor/R_inferno.pdf
For organization guidelines, see
- https://guides.lib.berkeley.edu/c.php?g=652220&p=4575532
- https://kbroman.org/steps2rr/pages/organize.html
- https://drivendata.github.io/cookiecutter-data-science/
- https://ecorepsci.github.io/reproducible-science/project-organization.html
For additional logging capabilities, see https://cran.r-project.org/web/packages/logr/
For very large projects, there are many more tools available at https://cran.r-project.org/web/views/ReproducibleResearch.html
For larger scale projects, use scripts
Some other good packages for posters/presenting you should be aware of
- https://github.com/mathematicalcoffee/beamerposter-rmarkdown-example
- https://github.com/rstudio/pagedown
- https://github.com/brentthorne/posterdown
- https://odeleongt.github.io/postr/
- https://wytham.rbind.io/post/making-a-poster-in-r/
- https://www.animateyour.science/post/How-to-design-an-award-winning-conference-poster
Overview of Applications
- https://bookdown.org/yihui/rmarkdown/shiny-documents.html
- https://shiny.rstudio.com/tutorial/
- https://shiny.rstudio.com/articles/
- https://shiny.rstudio.com/gallery/
- https://rstudio.github.io/leaflet/shiny.html
- https://mastering-shiny.org/
More Help with Shiny Apps
- https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/
- https://mastering-shiny.org/basic-app.html
- https://towardsdatascience.com/beginners-guide-to-creating-an-r-shiny-app-1664387d95b3
- https://shiny.rstudio.com/articles/interactive-docs.html
- https://bookdown.org/yihui/rmarkdown/shiny-documents.html
- https://shiny.rstudio.com/gallery/plot-interaction-basic.html
- https://www.brodrigues.co/blog/2021-03-02-no_shiny_dashboard/
- https://bookdown.org/yihui/rmarkdown/shiny.html
- https://shinyserv.es/shiny/
- https://bookdown.org/egarpor/NP-UC3M/kre-i-kre.html#fig:kreg
- https://engineering-shiny.org/