3 That’s Great, But How?
3.1 Existing Resources
There are no shortage of individuals proselytizing the virtues of open science, replication, and FAIR Data, however, these principles are rarely practically or effectively demonstrated. Peer reviewed publications discussing open science occasionally present vague summaries or lists of available tools for reproducible science, but no comprehensive tutorials. Conceptualized workflows such as the Replication Recipe are common (Brandt et al. 2014). Allen (2019) notes the benefits of Git, R Markdown, and Jupyter Notebooks in passing. (Allen and Mehler 2019). Ihle (2017) presents a sidebar with talking points for the benefits of using RStudio project, version control with Git, and reports with R and Shiny (Ihle et al. 2017). Lastly, Hampton (2015) provides a wonderfully organized list of mostly open source software and tools to aid in open science (Hampton et al. 2015).
Other available resources for open science and reproducible research can be cumbersome books with extraneous statistical theory (Gandrud 2018), or blog posts that are extremely light on details (Burchell 2016). workflowr
package is a recently developed R package designed to aid scientists in creating reproducible research. This package employs most the techniques I will showcase through the rest of this tutorial, however, it wraps them up with helper functions and into a somewhat dogmatic approach to reproducible science (Blischak, Carbonetto, and Stephens 2019). They are aimed at users of all skill levels. The workflowr
GitHub site is a good place to familiarize yourself with the package’s offerings (Blischak, Carbonetto, and Stephens 2020). In the remainder of this guide I feature a more flexible lower level approach to R packages for reproducible research compared to workflowr
. I believe these techniques are accessible to novice users, and make the most sense for those who will be using R consistently for their research.
3.2 Benefits of R for Reproducible Research
The R programming language, RStudio IDE, and R packages are well suited for open science. R and RStudio are end to end truly open source solutions for open science. While there are other analytic platforms that permit open and reproducible research, I will not review those here, because, simply put, I only know R. If you work primarily with Python, SAS, MATLAB, or any other software platform I encourage you to seek out ways to facilitate open and distributable science on those platforms. R and RStudio have capabilities that cover all aspects of open science, replication, reproduction, distribution, FAIR Data, open code, and appropriate applied statistical methods:
- Cutting edge applied statistical analysis and visualizations.
- Python is the current leader for most machine learning applications, but R offers better high end statistical modeling developed from peer reviewed methodologies.
- Open source platforms that are free and allow anyone to interact with your code, documentation, and analysis.
- Tools to organize code into functions with easily generated help documentation.
- Tools for scripted data acquisition, data pre-processing, embedding processed data, and easily generate documentation.
- Embedded short form analysis and reports with package vignettes and R Markdown.
- Vignettes can be written to several formats including html and Microsoft Word.
- Embedded professional manuscripts using R Markdown PDF outputs built on LaTeX.
- Embedded slide presentations with R Markdown.
- Slide presentations can be written to several formatins including Powerpoint, LaTeX Beamer, ioslides, and Reveal.js.
- Excellent Git integration with the RStudio IDE.
- Git allows your package and research to be installable by anyone from inside the R console without navigating the GitHub or GitLab websites.
- Git allows you to easily generate a free website for your package with a welcome page, reference manual with function and dataset documentation, manual
Transitioning from being and R user to creating R packages can be intimidating. I was a high level R user for 7-8 years before taking on my first package. Converting from an RStudio project with dozens of loosely organized and partially documented scripts with the occasional README txt file to a fully organized, documented, Git protected, and web-hosted package feels like a big step. This becomes more difficult due to the lack of centralized detailed guides on this process that are specifically aimed at reproducible research and not traditional package development. Moreover, from 2010-2020 there were significant changes to the organization and best practices of the R packages that are used to assist with R package development. Because there are very few consolidated resources for using R packages for reproducible research, the methods I apply here are an amalgamation of techniques I pulled from a variety of sources mostly designed for developers of traditional R packages. I developed this guide to demonstrate a detailed approach to open science with R packages.
References
Allen, Christopher, and David M. A. Mehler. 2019. “Open Science Challenges, Benefits and Tips in Early Career and Beyond.” PLOS Biology 17 (5): e3000246. https://doi.org/10.1371/journal.pbio.3000246.
Blischak, John, Peter Carbonetto, and Matthew Stephens. 2020. “A Framework for Reproducible and Collaborative Data Science.” 2020. https://jdblischak.github.io/workflowr/index.html.
Blischak, John D., Peter Carbonetto, and Matthew Stephens. 2019. “Creating and Sharing Reproducible Research Code the Workflowr Way.” F1000Research 8 (October): 1749. https://doi.org/10.12688/f1000research.20843.1.
Brandt, Mark J., Hans IJzerman, Ap Dijksterhuis, Frank J. Farach, Jason Geller, Roger Giner-Sorolla, James A. Grange, Marco Perugini, Jeffrey R. Spies, and Anna van ’t Veer. 2014. “The Replication Recipe: What Makes for a Convincing Replication?” Journal of Experimental Social Psychology 50 (January): 217–24. https://doi.org/10.1016/j.jesp.2013.10.005.
Burchell, Jodie. 2016. “A Crash Course in Reproducible Research in R.” October 14, 2016. http://t-redactyl.io/blog/2016/10/a-crash-course-in-reproducible-research-in-r.html.
Gandrud, Christopher. 2018. Reproducible Research with R and R Studio. Chapman and Hall/CRC. https://doi.org/10.1201/9781315382548.
Hampton, Stephanie E., Sean S. Anderson, Sarah C. Bagby, Corinna Gries, Xueying Han, Edmund M. Hart, Matthew B. Jones, et al. 2015. “The Tao of Open Science for Ecology.” Ecosphere 6 (7): art120. https://doi.org/10.1890/ES14-00402.1.
Ihle, Malika, Isabel S. Winney, Anna Krystalli, and Michael Croucher. 2017. “Striving for Transparent and Credible Research: Practical Guidelines for Behavioral Ecologists.” Behavioral Ecology 28 (2): 348–54. https://doi.org/10.1093/beheco/arx003.