10 Integrating With Git

Git. A scary word if you lack a computer science background. Git is a free and open source system for version control (Torvalds and Hamano 2020). Integrating Git with any research workflow, not just package development, is almost a necessity in the workforce; it’s a core component of modern data science and research. Git provides 4 major benefits to your research:

  1. Research becomes distributable and easily installable for colleagues.
  2. In turn, collaboration is much easier.
  3. Provides a safety net for coding or other mistakes through version control.
  4. When used in conjunction with GitHub or GitLab, Git provides architecture for a free website for your project.

10.1 Getting Started

First off, Git, GitHub, and GitLab are separate entities. This is confusing for new users. Git is a software protocol that is utilized by GitHub and GitLab to offer free web platforms for online repositories and other services such as Continuous Integration (CI) and project websites. GitHub is by far the most popular platform for R users and developers, but for the purposes of this project I will demonstrate setup with GitLab, because that is our company platform. There are numerous resources from the RStudio team, usethis, pkgdown, the general tidyverse, and bookdown to get you started with GitHub and R or R Packages. Additionally, most of the basic operations are extremely similar between platforms.

10.2 Installing Git

Before you begin, make sure Git is installed on your local computer. You can install git for Windows or MacOS by downloading the corresponding installer. For Linux/Ubuntu users, Git may be installed over the terminal by issuing the following command:

apt-get install git

10.3 The RStudio Git Interface

RStudio has a basic Git interface that is helpful for new users for tracking changes, staging files, making commits, and pushing/pulling to repositories, but you may perform all Git actions through the RStudio Terminal tab if you prefer. Windows users are advised to integrate the RStudio Terminal with the git installation by selecting Tools > Global Options... to open the Global Options GUI. Next, click on the Terminal tab to reveal the New terminals open with: drop down menu; select Git Bash. Linux and MacOS users have this functionality by default through the RStudio Terminal tab.

The RStudio Windows Terminal interface.

(#fig:windows terminal gui)The RStudio Windows Terminal interface.

RStudio has recently made other options available such as Windows PowerShell. You may find these tools helpful in future projects if you are a heavy Windows user.

Despite RStudio’s Git integration, there’s no point and click manner by which to connect a project to an active GitLab or GitHub repository. There is an option in the New Project Wizard interface to initiate a Git repository with project creation, but this does not seamlessly connect an RStudio project with a repository.

The RStudio New Project Wizard interface.

(#fig:rstudio new proj git)The RStudio New Project Wizard interface.

10.4 Creating an Empty Repository

To integrate git with our myresearch package we’ll begin by creating/signing in to our GitLab account and starting a New Project, the selecting Create blank project.

Creating a new project on GitLab.

(#fig:new gitlab proj)Creating a new project on GitLab.

The next page has you fill out basic repository information like the name, a brief description, and the privacy settings.

Filling out basic project information.

(#fig:gitlab proj info)Filling out basic project information.

10.5 Connecting to the Remote Repository

After creating the new repository, the repository homepage is launched. It is currently empty, but multiple sets of instructions for populating the repository are listed below. Because the package has already been set on the local computer, we will use the Push an existing folder instructions.

New GitLab repository instructions.

(#fig:push existing folder)New GitLab repository instructions.

Click on RStudio’s Terminal tab situated between the Console and Jobs tab in the lower if you’re using the default layout. If you successfully installed Git for Windows and set Git Bash as your RStudio default Terminal (per the instructions above), you will see the Windows Git Bash prompt in this tab. Linux and MacOS users will be presented with their default bash terminals.

The Windows Git Bash RStudio terminal tab.

(#fig:windows terminal)The Windows Git Bash RStudio terminal tab.

By default, it should show the root directory of the myresearch package. In the terminal window, issue the following commands:

git init
git remote add origin git@gitlab.com:repository-address.git

If you have not set up SSH keys on your GitLab or GitHub account, the git remote add origin command may show an https:// repository address; use whatever command is listed.

Git is a software protocol, GitHub and GitLab are web platforms. Git commands used in the terminal work across Git web platforms and hosting sites.

At this point you may issue the git add . command as listed in the instructions, however, this will add every file in your package to the remote repository. In most instances this is undesirable, because you do not want or need to add every file in the package directory to the git repository. Files you do not want to include may be personal files, those used only for development, or files that are overly large. These may include:

  • The .RData file tracking your environment objects.
  • RStudio project and package files not necessary to build your package, use functions, or compile vignettes (.Rhistory, .Ruserdata, .Rproj.user).
  • Large raw data files used to embed processed data sets.
  • Files generated by R CMD Check (.Rcheck/).
  • Compiled vignettes and package reference materials automatically generated by roxygen2 in the docs/ folder.

These, or any other file(s), may be excluded by using a .gitignore file. The usethis package has a command for creating the .gitignore, but you can create it with a simple text file.

usethis::edit_git_ignore()

Simple Google searchers will reveal numerous user defined templates for R package development. There’s even an R package (gitignore(???)) specifically designed for managing .gitignore files. An important thing to remember, and this goes for most of Git, is that you will inevitably make mistakes and accidentally include files you do not want to, and then try to retroactively remove them, but there still there, so you try several forum posts…it’s a learning process; don’t be discouraged.

10.6 Enabling the RStudio Git Interface

Git is now initiated in the package root directory and the local directory is connected to the remote repository we created, but RStudio has yet to recognize these changes and make the RStudio Git interface available. In order for RStudio to recognize the git connection, you must close the current project and re-open it. The easiest way to accomplish this is by using the project drop down menu to close the project, and then re-open it from the list of recently used projects in the same drop-down menu. The project drop down menu is located in the top right of the RStudio layout.

The RStudio project drop-down menu.

(#fig:project drop down)The RStudio project drop-down menu.

References

Torvalds, Linus, and Junio Hamano. 2020. Git (version 2.27.0). https://git-scm.com/.