4 Developing Your Own R Package for Research
4.1 What is an R Package?
R packages are the fundamental unit of distributable code in the R programming language. Packages are organized into several directories and files that bundle code, data, reference materials, tests, and vignettes (Wickham 2015). The most common and well knows packages deliver functions that assist users with statistical modeling, data processing, and data visualization. Some of the most popular include ggplot2
, caret
, data.table
, and dplyr
. These undergo quality control methods, are available on the Comprehensive R Archive Network (CRAN), and installable in R with:
But not all packages have to be great advancements in data science for the masses. At their simplest, a package can be a collection of your personal functions with no intentions to distribute. Packages provide an organizational template, help detect inconsistencies or errors in your code, and ultimately save you time. There is a middle ground between a top 10 CRAN downloaded package and a few functions you made for a pet project. Developing an R package for a research grant or corporate client, in conjunction with Git version control, is an excellent way to create a fully open source, replicable, reproducible, and distributable set of analyses.
4.2 Getting Started
I will demonstrate this process using Windows, because it has a greater market share, however, these techniques will work (with slight differences) on Linux, MacOS, or an RStudio Server session through a web browser (my preferred method). There are several great resources for R package development that I suggest your review following this guide. The RStudio Team has support for much of the R universe; not just packages (RStudio Team 2020a). Hadley Wickham and the greater tidyverse
(Tidyverse Development Team 2020) collection of R resources for data science are by far the most popular resource for modern R data science.
To assist you in getting started, every package or resource listed is accompanied by the the official website or download page, and (when available) the official citation. To begin, you will need a copy of R (R Core Team 2020b, 2020a) and RStudio (RStudio Team 2020c, 2020d) running on your local machine.
4.3 Helper Packages
The R community has several packages designed specifically to assist users developing packages. Some are called upon directly with functions you write out in the console, and others are operating either fully or partially behind the scenes. These are the most common:
4.3.1 Direct Packages
usethis
automates several procedures for package and project development (Wickham and Bryan 2020b). usethis
assists with creating a new package, adding licenses, adding dependencies, creating news feeds, embedding data, enabling different Git functionality, and numerous additional helper functions. I generally stick to the simpler functions (embed data, add vignettes, create license, create logo), but their developers are constantly adding new features. Whenever I start a new project or begin a new development cycle I find it a good exercise to explore additional functions of usethis
and the next package.
pkgdown
provides automated helpers to generate a website for your package (Wickham, Hesselberth, and Team 2020). The website can be generated locally for internal documentation, or combined with GitLab or GitHub’s Continuous Integration to create a free website hosted externally on GitHub or GitLab. There are some customization options, but at it’s core, pkgdown
produces a static website with a welcome page, reference manual detailing all included functions and datasets, a newsfeed, and article section with all your research vignettes and analyses.
4.3.2 Indirect Packages
devtools
is the workhorse behind R package development (Wickham, Hester, and Chang 2020), In fact, it’s so vital to package development that many core devtools
features are integrated into RStudio’s graphical package interface, and do not need to be called directly by the user in the console. It’s important to note that many usethis
and pkgdown
function were originally part of devtools
. When searching for information regarding package development you are very likely to stumble upon older, yet still popular, resources such as the first edition of Hadley Wickham’s text for package development. Be cognizant that these functions are now in different packages.
roxygen2
facilitates the creation of automated reference manuals for your package functions and datasets (Wickham et al. 2020). You almost never have to call functions from roxygen2
directly, but all functions and embedded datasets you create use roxygen’s syntax and comments to generate reference materials.
rmarkdown
(RStudio Team 2020b; Xie, Allaire, and Grolemund 2020), knitr
(Xie 2020a), tinytex
(Xie 2020b), and Pandoc (John MacFarlane 2020) work together to form the backbone of embedded reproducible reports and manuscripts. rmarkdown
is an extension of the markdown
(John Gruber and Aaron Swartz 2004) markup language that converts plain text (.Rmd
) files into a number of different file formats; in this context these are usually HTML and PDF outputs. If you want to render PDF outputs, you also need a LaTeX distribution installed on the system along with R and RStudio. You can use whatever popular installation you may be comfortable with, however, I strongly suggest you use tinytex
. It’s small and plays very nicely with R, rmarkdown
, RStudio, and Pandoc. Whatever you do, do not run multiple LaTeX distributions on the same system; it will only bring you pain and suffering. knitr
executes the code “chunks” embedded within the .Rmd file and “knits” them together with the text to form the output. Pandoc is a standalone software package (not an R package) designed to convert documents from one format to another. When you write vignettes, reports, manuscripts, or slide deck presentations:
- They’re written in plain text with an
rmarkdown
file (.Rmd
). knitr
executes any embedded code in thermarkdown
file (.Rmd
), “knits” them together with the text, and produces a markdown file (.md
).- Pandoc converts the markdown (
.md
) file into the specified output format.
Before moving on you should make sure these packages are installed and updated. If you’re using a preconfigured academic or corporate installation of R and RStudio they may already be installed, but there’s no harm in running the commands again ensure the packages are up to date.
install.packages('pkgdown')
install.packages('usethis')
install.packages('devtools')
install.packages('roxygen2')
install.packages('knitr')
install.packages('rmarkdown')
If you want to create PDF outputs for also install tinytex
. Make sure to first uninstall any existing LaTeX distribution such as TexLive.
After the tinytex
package completes its installation, you must install the the actual LaTeX distribution. tinytex
has a function to do this inside of the R console.
Pandoc comes bundled along with rmarkdown
when you install RStudio, but if you’re using a different IDE other than RStudio it’s a good idea to install Pandoc manually.
4.4 Creating the New Package
Start by creating a new project by through RStudio’s menu selecting File > New Project...
and select New Directory
.

(#fig:New Project GUI)The RStudio New Project interface.
Then select R Package
.

(#fig:New Project-New Directory GUI)The RStudio New Project-New Directory interface.
This opens up the New Package interface. At this point you select a name for your package. For this exercise we’ll call our new package myresearch
. Package names must alphanumeric with no spaces or special characters. It’s a real pain to change your package name so choose wisely. If you regret your package name it’s easiest to back up your functions, scripts, vignettes, and just create a new package to dump them in; especially if you’re using version control with GitLab or GitHub. Leave Create git repository
un-checked. I find it easier to add Git to the package later using the terminal. Lastly choose your directory for installation. The default is your home directory, but I placed this package in a sub-directory for packages. Click Create Project
when you’re finished.
4.5 RStudio’s New Package Window Layout
You will be greeted by a fresh RStudio window for your new package. The package name is listed as the RStudio Project name (top right), the script window displays a new function hello.R
, the Files
window defaults to your package root directory, and the Environment window now has an additional tab for package development named Build
.

(#fig:RStudio New Package Layout)RStudio window layout for newly created package.
Before exploring the individual components of the new package layout, we’ll briefly set some package settings. Go to the Environment
window and click Build > More > Configure Build Tools...
. In this dialogue we want to check the box for Generate documentation with Roxygen
, which will bring up the Roxygen Options
window. It’s a good idea to check the Install and Restart
box under Automatically roxygenize when running
. This option ensures that package documentation and reference materials are updated every time you rebuild locally after changes.

(#fig:Build and Roxygen Options)RStudio Build Tools and Roxygen Options interfaces.
You can also check the Vignettes
box under the Roxygen Options
, however, this isn’t always behaving as expected in the current RStudio build. Another consideration to this option is if you have large and time-intensive vignettes you may not want to rebuild them every time you make a minor change and rebuild your package.
References
John Gruber, and Aaron Swartz. 2004. Markdown (version 1.0.1). https://daringfireball.net/projects/markdown/.
John MacFarlane. 2020. Pandoc: A Universal Document Converter (version 2.9.2.1). pandoc.org.
R Core Team. 2020a. R: A Language and Environment for Statistical Computing. (version 4.01). Vienna: R Foundation for Statistical Computing. https://cran.r-project.org.
R Core Team. 2020b. “R: The R Project for Statistical Computing.” 2020. https://www.r-project.org/.
RStudio Team. 2020a. “Developing Packages with RStudio.” RStudio Support. 2020. http://support.rstudio.com/hc/en-us/articles/200486488.
RStudio Team. 2020b. “R Markdown.” R Markdown from RStudio. 2020. https://rmarkdown.rstudio.com/.
RStudio Team. 2020c. RStudio: Integrated Development for R. Boston, MA: RStudio, PBC. http://www.rstudio.com.
RStudio Team. 2020d. “RStudio | Open Source & Professional Software for Data Science Teams.” 2020. https://rstudio.com/.
Tidyverse Development Team. 2020. “Tidyverse.” 2020. https://www.tidyverse.org/.
Wickham, Hadley. 2015. “R Packages.” R Packages by Hadley Wickham. 2015. http://r-pkgs.had.co.nz/.
Wickham, Hadley, and Jennifer Bryan. 2020b. “Usethis: Automate Package and Project Setup.” 2020. https://usethis.r-lib.org/.
Wickham, Hadley, Peter Danenberg, Gábor Csárdi, and Manuel Eugster. 2020. “Roxygen2: In-Line Documentation for R.” Roxygen2. 2020. https://roxygen2.r-lib.org/.
Wickham, Hadley, Jay Hesselberth, and RStudio Team. 2020. “Pkgdown: Build Websites for R Packages.” Pkgdown. 2020. https://pkgdown.r-lib.org/.
Wickham, Hadley, Jim Hester, and Winston Chang. 2020. “Devtools: Tools to Make Developing R Packages Easier.” Devtools. 2020. https://devtools.r-lib.org/.
Xie, Yihui. 2020a. “Knitr: Elegant, Flexible, and Fast Dynamic Report Generation with R.” 2020. https://yihui.org/knitr/.
Xie, Yihui. 2020b. “TinyTeX: A Lightweight, Cross-Platform, Portable, and Easy-to-Maintain LaTeX Distribution Based on TeX Live.” 2020. https://yihui.org/tinytex/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2020. R Markdown: The Definitive Guide. https://bookdown.org/yihui/rmarkdown/.