Introduction

R code can be written in two general ways: (1) as Literate Programs using R Markdown Notebooks and (2) as scripts. Scripts are programs that can run within an IDE such as R Studio or be directly executed from the command line through R. Scripts have the benefit that they can be debugged using a debugger, while R Notebook code chunks are more difficult to debug. Another benefit of scripts is that they can be run from the command line and as part of cron jobs, i.e., they can be scheduled to run automatically at a point in time. Finally, R scripts can be included in shell scripts (on Unix and MacOS) and .bat batch programs (on Windows).

R Notebooks are markdown documents that contain embedded code in R and other languages. They are most useful when producing reports, memos, and analytics journals where we need to intersperse narratives and text with code. An R Notebook produces a document (HTML or PDF, most commonly) when it is “run”. An R Script is a program that runs like in any other language.

R Scripts are preferable when you need stand-alone R programs that can be run directly from the command line or within a shell script. One main benefit of using R Scripts is that we can use the debugger within R Studio.

Writing R Scripts in R Studio

An R script is a text file containing R statements. In R, every statement must be on a separate line. A script file can be created in any text editor, but not a word processor. One of the easiest ways to create an R script is to use R Studio because R Studio provides mechanisms for debugging, syntax aware editing, an object navigator, among other features.

While not actually required, R scripts should have the extension .R.

File and Path References

Unlike an R Notebook, R does not consider the folder (directory) where it is located to be the current working directory. You must explicitly specify all paths or use setwd() to set the working directory for the duration of the script. This is a particular issue when running within an R Project.

Running R Scripts from the Command Line

Running R from the command line provides much more flexibility than running R only within an IDE such as R Studio. For example, we can process multiple files at once through loops in a shell script. We can write several smaller R programs and chain them together so that each program carries out one step in the analysis. The programs become more modular and can be used independently.

To run (or execute) an R script requires that “Base R” is installed; it does not require R Studio to be installed. To run an R script from the command line, start or open a command line shell (also known as a terminal or simply a shell). On Windows, run the program cmd or powershell, on MacOS launch Terminal. While they look similar, cmd is a traditional DOS shell, while PowerShell is more similar to a Unix shell. Terminal is a standard Unix bash (Bourne Again Shell). MacOS comes with bash as the default user shell and also includes the TENEX C shell (tcsh), the Korn shell (ksh), and the Z shell (zsh). This generally does not matter for simple command line interactions but matters if you wish to write shell scripts.

Assume that the file ScriptA.R is an R program. We can run it directly from the command line using the Rscript program that is installed when you install R:

Rscript ScriptA.R

To launch R for interactive work from the command line, you can launch it as follows:

R

To run Rscript or R from the command line presumes that they are in your “path”. If you get the error that they cannot be found, then you need to add the installation folder for R to your path or specify the full path to RScript/R.

The example below shows how to embed an R program within a shell script on Unix (and MacOS/Linux).

#!/bin/sh

Rscript ScriptA.R

After creating a new shell script, don’t forget to set execute permissions on the file with:

chmod +x analyze

The video tutorial demonstrates how to run an R program from the command line and within shell scripts.

Nesting R Scripts

You can “call” or run another R Script from within an R Script using either source() or sys.source() as shown below:

source("another-script.R")

Program Structure

R Scripts are scripts in the sense that they are R statements that are executed line-by-line starting with the first line. There is no “entry point” like in C where execution starts at the function main(). However, we can adopt that structure by making the first line a call to the function main() as shown below and then placing all code with a main() function.

Conclusion

R is a powerful data manipulation and statistical programming language that can be used to create standalone programs.


Files & Resources

All Files for Lesson 6.109

References

None.

Errata

None collected yet. Let us know.

