Preface
This tutorial presumes that you have R and R Studio installed, or that you have an account on rstudio.cloud. If you do not already have R and/or R Studio you will need to download and install them. You must first install R from R Project and then the R Studio IDE from R Studio.
R Notebooks vs R Scripts
There are several distinct ways to write R code and to program in R: R Notebooks, Quatro Documents, and R Scripts. An R Markdown Notebook (.Rmd file) and an R Script (.R file) are both tools used in the R programming environment, but they serve different purposes and have distinct features:
R Markdown Notebook
- Integrated Documentation and Code:
- R Markdown allows you to combine narrative text, code, and output in a single document. It supports Markdown syntax for text formatting, making it easy to create well-documented reports, presentations, and interactive notebooks. The code blocks can be a mix of R, Python, SQL, among many other programming languages. R Notebooks are similar to Jupyter Notebooks often used in Python.
- Dynamic Report Generation:
- You can execute code chunks within the document and immediately see the results, including plots, tables, and other outputs. This makes R Markdown ideal for creating dynamic and reproducible research documents.
- Interactive Features:
- Notebooks in R Markdown can include interactive features, such as HTML widgets and Shiny apps, which can be embedded directly into the document.
- Output Formats:
- R Markdown documents can be rendered into various formats, including HTML, PDF, Word, and slideshows. This flexibility allows for easy sharing and presentation of the results.
- Ease of Collaboration:
- The combination of code and its output with narrative text makes it easier to share complete analyses with collaborators, who can see both the methods and the results in context.
R Script (.R file)
- Code Only:
- An R Script is a plain text file containing only R code. It does not support Markdown or other types of text formatting.
- Execution:
- R Scripts are typically run line-by-line or as a whole in an R console or RStudio environment. They are straightforward for writing and executing R code but do not provide integrated documentation or immediate visual feedback within the script itself.
- Simplicity:
- The simplicity of R Scripts makes them ideal for writing and testing R code quickly. They are less cumbersome than R Markdown files when you only need to write and execute code without the need for extensive documentation or presentation of results.
- Use Cases:
- R Scripts are commonly used for scripting tasks, data manipulation, analysis pipelines, and batch processing where the primary focus is on code execution rather than presentation.
R Markdown Notebook (.Rmd file) are best for creating comprehensive, reproducible documents that integrate code, results, and narrative text. They can contain a mix of code chunks (blocks) in R, Python, SQL, C++, Java, D3, and many other languages. Ideal for reports, interactive documents, and presentations. On the other hand, R Script (.R file) are preferable for straightforward coding tasks, scripting, and batch processing where the focus is solely on the code without the need for integrated documentation or presentation. They are “programs” similar to Python programs. Both types of “programs” are essential in the R ecosystem, and their use depends on the specific needs of your project. This tutorial will focus on R Markdown Notebooks.
R Notebooks
R Notebooks are documents that combine code and text, allowing for reproducible research and data analysis. They are created in Posit (formerly R Studio), an integrated development environment (IDE) for the R and Python programming languages.
R Notebooks use a text-based document writing system called R Markdown for literate programming, which allows you to embed executable R (as well as Python, C++, SQL, and bash) code within Markdown text. This means you can write narrative text, add headings, insert figures, embed images, link to URLs, and include mathematical LaTeX-formatted equations in your document, while also executing R code chunks and displaying the results inline.
The R Markdown syntax is based on the Markdown syntax, which is a lightweight markup language used for formatting text documents. R Notebooks allow you to write documents that are both human-readable and executable. They make it easy to share your code, data, and analysis with others, while also providing a clear and organized record of your analysis process.
Workflow
All of your work must be done within an R Project, so create a project first, or, if you already have one, open the project before doing any work.
New R Notebook
To build a new R Notebook, you should follow these steps:
- Open RStudio
- Click on “File” in the top left-hand corner, and then click on “New File” and select “R Notebook”.
- This will create a new R Notebook with some pre-populated text and code. You can modify this as needed.
- You can add new code chunks by clicking on the “+” icon in the top left-hand corner of the notebook or by using the keyboard shortcut “Ctrl + Alt + I”.
- Write your R code within the code chunks.
- Write narrative text using Markdown syntax in between the code chunks.
- Test each code chunk by clicking the “Run” button within the chunk or using the keyboard shortcut “Ctrl + Enter”.
- To save your R Notebook, click on “File” and then click on “Save As…”. You can then save your notebook in your desired location.
- Knit the R Notebook to the desired document format (HTML or PDF).
With an R Notebook, you can mix R code with narrative text, which makes it easy to document your analysis and share it with others. Additionally, you can include visualizations and other outputs directly in the notebook, which makes it easy to present your results.
Knitting
Once your R Notebook is written, it must be knitted. Knitting generates a markdown document which is then converted to a document, commonly PDF or HTML. This process is often called knitting or rendering; it is a form of compilation. During the knitting process, code chunks are executed and any output is embedded in the result document, unless chunks are marked to not be evaluated or not being included.
While writing the R Notebook, code chunks can be executed individually which can be helpful for development. During knitting, the chunks are run in sequence starting with the first code chunk.
Tutorial
Before proceeding with the remainder of this lesson and the specific R Notebook writing elements, watch this short narrated tutorial that demonstrates how to build R Notebooks and knit them to HTML and PDF documents.
Knitting to PDF requires installation of TeX tools which may prove to be difficult on some operating systems, notably MacOS. A work-around is to upload your R Notebook and supporting files to posit.cloud and knit there and then download the resulting PDF to your local project folder. The tutorial below demonstrates this workflow.
R Studio Projects
Projects are a better way to manage code rather than creating individual R Notebooks, R Scripts, and other code files. Projects allows all files, including data files, to be managed as a single unit, shared, and version controlled using services such as git and GitHub. Consult Lesson 6.202 – Working with R Projects for more details on how to create R Projects.
Literate Programming
Literate Programming is a document-centric programming paradigm introduced by Donald Knuth around 1984. A literate computer program is a “document” written in a natural language such as English, interspersed with chunks of source code in various programming languages, from which compilable source code can be generated. This programming approach is commonly used in scientific computing, data programming, and data science with the goal of producing reproducible data pipelines and data analyses. Jupyter Notebooks and R Notebooks are two common implementations of this paradigm.
A literate program must be “compiled” into code suitable for execution and a representation that can be viewed. The latter is commonly an HTML or PDF document. The former is commonly R or Python code interspersed with C++, Java, and SQL code and embedded formatting directives in a markup language such HTML, Markdown, and LaTeX.
The knitr package implements the programming paradigm within R Notebooks, which are, in turn, a variant of R Markdown. R Markdown is a superset of Markdown.
From the perspective of information science, a document is an aggregation of information objects, some of which are created programmatically.
R Markdown in a Nutshell
The key markdown directives are illustrated below with their result right afterwards:
This is _markdown_. Markdown contains embedded formatting
such as _italics_, **boldfacing**, and ~~strikes~~.
Of course we can also do superscripts
like x^2^ or subscripts like x~i~.
Headers are marked with hash marks ###
#### LaTeX Equations
More complex formatting of math expressions can be done
with embedded [LaTeX](https://latex-tutorial.com/)
like this $x_i$ or on a separate line:
$\bar{x}=\frac{1}{n}\sum_{i=0}^{n}(x_i)$
or centered like this:
$$
\bar{x}=\frac{1}{n}\sum_{i=0}^{n}(x_i)
$$
Did you see how I created an embedded link to a URL?
Numbered lists are produced with:
1. item 1
2. item 2
a. sub item a
b. sub item b
3. item 3
Bulleted lists can be produced with:
- bullet item
- bullet item
Task lists can be produced with:
- [ ] task 1
- [ ] task 2
- [x] task 3 is done
> Lastly, you can use > to make blockquotes for calling out important
information.
Did you notice how the second line in the block quote was intented?
The markdown above is knitted to the following format:
This is markdown. Markdown contains embedded formatting such as italics, boldfacing, and strikes. Of course we can also do superscripts like x2 or subscripts like xi.
Headers are marked with hash marks ###
LaTeX Equations
More complex formatting of math expressions can be done with embedded LaTeX like this \(x_i\) or on a separate line:
\(\bar{x}=\frac{1}{n}\sum_{i=0}^{n}(x_i)\)
or centered like this:
\[
\bar{x}=\frac{1}{n}\sum_{i=0}^{n}(x_i)
\]
Did you see how I created an embedded link to a URL?
Numbered lists are produced with:
- item 1
- item 2
- sub item a
- sub item b
- item 3
Bulleted lists can be produced with:
Task lists can be produced with:
Lastly, you can use > to make blockquotes for calling out important information.
Did you notice how the second line in the block quote was intented?
See the R Markdown Reference Guide for a complete list of markdown elements.
R Notebook Elements
Embedded Code
One of the most useful elements of an R Notebook is the ability to embed code fragments within text. Code within a code block (or code chunk or code fence) is generally in R but can also be in Python, Java, C++, bash, or SQL.
Embedding R Code
A code chunk is created with three backticks in a row followed by a set of braces with the programming language.
```{r echo=F}
```
Code can also be embedded inline within text like so: `r a+b`
The result of the code in then added in the output document.
Named Code Chunks
Code chunks should be named so they can be more easily found. Adding a name after the language creates a label for the code chunk:
```{r chunk-name-without-spaces}
# code goes here
```
Hiding Code Chunks in Output
Often, we want the code to run but we do not want the code to be part of the output document. To do that, add “echo=F” after the code chunk name, like so:
```{r chunk-name-no-spaces, echo=F}
# code goes here
```
Suppress Evaluation
The code in a code chunk can be included (or not) in the document even though the code is not run. That is affected with the “eval=F” parameter.
```{r chunk-name-no-spaces, eval=F}
# code goes here
```
Code Chunks with Line Numbers
To add line numbers to the code, add “attr.source=‘.numberLines’” to the parameters.
```{r codeNoLines, attr.source='.numberLines'}
if (TRUE) {
x <- 1:10
x + 1
}
```
The above will produce:
if (TRUE) {
x <- 1:10
x + 1
}
Navigating Code Chunks
In general, a code chunk starts with the language followed by an optional name for the chunk. The name (or chunk label) is useful for quick navigation and for referencing within the document. In R Studio, it can be used for quickly jumping to a section of code.
Jumping to A Chunk by Name
Embedding C++ Code
To embed C++ code, use either {Rcpp} or {r engine=‘Rcpp’} for the fence header.
```{Rcpp fibCode, eval=T}
#include <Rcpp.h>
// [[Rcpp::export]]
int Fibonacci(const int x) {
if (x == 0 || x == 1)
return(x);
return (Fibonacci(x - 1)) + Fibonacci(x - 2);
}
```
Because the function Fibonacci()
was defined with the Rcpp::export attribute it is accessible to R code chunks as a normal R function:
```{r eval=F, echo=T}
print(paste0("10! = ", Fibonacci(10L)))
fact20 <- Fibonacci(20L)
print(paste0("20! = ", fact20))
```
Note that caching should not be used with Rcpp code chunks (since the compiled C++ function will not survive past the lifetime of the current R session).
Headings
Heading levels are specified with hashmarks (#). One hashmark is a first-level header (the title for the document), a second-level header is a section headings, and so forth.
---
title: "Predicting Car Sales"
date: "August 23, 2022"
author: "<small>Martin Schedlbauer</small>"
## Introduction
## Analysis
### Data Cleaning
### Factor Analysis
## Conclusion
---
Embedding LaTeX
LaTeX equations can be embedded in three different ways:
- inline by enclosing the equation in a text blow like so $z = (x - ) / $
- in a separate block by enclosing within $ but the equation is on its own line
- in double $$ which puts the equation in its own block and is centered
The fragment below illustrates how to embed LaTeX.
In mathematical terms, two relations $R_1$ and $R_2$ are union compatible if and only if
$degree(R_1)=degree(R_2)$, and $\mathfrak{D}(R^{A_i}_{1}) = \mathfrak{D}(R^{A_i}_{2})$, where $A_i$ is the *ith* attribute and $\mathfrak{D}(R^{A_i}_{k})$ is the domain of the *ith* attribute of relation $R_k$.
For example, the two relations $A$ and $B$ below are union compatible:
$A(a_1(char),a_2(char),a_3(date))$
$B(b_1(char),b_2(char),b_3(date))$
Embedding Images
Images can be embedded in several ways, the most common being the markdown shown below:
![image caption](URL or filename){style-parameters}
You can optionally add style parameters such as {width=50%} which are only meaningful when knitting to HTML.
Here’s an example:
![CRISP-DM Process](images/crisp-dm.png){width=30%}
The above would render as:
CRISP-DM Process
Embedding HTML
If the target document format is HTML, then it is permitted to insert HTML code fragments directly into the markdown document. This is useful when embedding external objects in HTML documents as illustrated below:
<iframe src="https://player.vimeo.com/video/821398"
width="480" height="270" frameborder="0"
title="Journaling with Markdown"
data-external="1"></iframe>
To ensure that embedded videos are playable, you must add the attribute data-external=“1” to any <iframe>.
Code Style
To render text as “code” using a monospaced front, enclose the text in single backticks. The fragment below illustrates this:
The correlation coefficient can be calculating using the
`cor()` function in R.
The above markdown would render as:
The correlation coefficient can be calculating using the cor() function in R. |
Chunk Options
Chunk output can be customized with knitr options, arguments set in the {} of a chunk header. Above, we use five arguments:
include = FALSE
prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
echo = FALSE
prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.
message = FALSE
prevents messages that are generated by code from appearing in the finished file.
warning = FALSE
prevents warnings that are generated by code from appearing in the finished.
fig.cap = “…”
adds a caption to graphical results.
comment = “…”
???
attr.source = “.numberLines”
adds line numbers to a code chunk
See the R Markdown Reference Guide for a complete list of knitr chunk options.
Global Options
To set global options that apply to every chunk in your file, call knitr::opts_chunk$set
in a code chunk. Knitr will treat each option that you pass to knitr::opts_chunk$set as a global default that can be overwritten in individual chunk headers.
Markdown Tables
While tables can be generated through R code using various R functions such as kable()
, they can also be created in markdown. Naturally, tables that contain calculated data or data from files should be included using R code chunks.
In Markdown, a table is a structured presentation of data, organized into rows and columns. To create a table in Markdown, you can use the vertical line “|” to separate each column and three or more dashes “—” to create the header for each column. Additionally, you should include a vertical line at both ends of the row to separate it from other content.
The markdown below produces this table:
| Month | Budget |
| -------- | -------|
| January | $2500 |
| February | $3000 |
| March | $4200 |
January |
$2500 |
February |
$3000 |
March |
$4200 |
The cell width do not have to align.
Align text in the columns to the left, right, or center by adding a colon “:” to the left, right, or on both side of the dashes — within the header row. The example is rendered below:
| Course | Credits | Term |
| :---------------- | :------:| ----: |
| Intro to Python | 2 | Fall |
| SQL for Beginners | 3 | Fall |
| Statistics with R | 4 | Spring |
| OOD with C++ | 4 | Summer |
Intro to Python |
2 |
Fall |
SQL for Beginners |
3 |
Fall |
Statistics with R |
4 |
Spring |
OOD with C++ |
4 |
Summer |
- :– means the column is left aligned
- –: means the column is right aligned
- :-: means the column is center aligned
Text can be formatted within tables using general markdown. However, some formatting options are not available within tables, including:
- Embedded R
- Headings
- Blockquotes
- Horizontal Lines
- Images
- Lists
- Embedded HTML
Summary
R Markdown is a tool that allows you to combine text and code to create documents that are both human-readable and executable. R Notebooks are a specific type of R Markdown document that are designed for interactive data analysis and reproducible research. By using R Markdown and R Notebooks, you can write documents that include code, text, equations, figures, and interactive visualizations, making it easy to document your analysis process and share your results with others. In this tutorial, you’ll learn how to create R Markdown documents and R Notebooks, and how to use Markdown syntax to format your text and embed code chunks in your documents.
Combining code and text in a Markdown document offers several benefits:
Reproducibility: By embedding code in a Markdown document, you can ensure that others can easily reproduce your work. They can see exactly what code you used to generate your results, and they can run that code themselves to verify your findings.
Transparency: By providing a clear and detailed record of your analysis process, you can increase transparency and build trust with your audience. This is especially important in fields like data science, where reproducibility and transparency are essential.
Documentation: Embedding code in a Markdown document can also make it easier to document your work. By including narrative text alongside your code, you can provide context and explanations for your analysis, making it easier for others to understand and replicate your work.
Presentation: Markdown documents can be easily converted into a variety of formats, such as HTML, PDF, and Word documents. This makes it easy to present your analysis in a polished and professional manner, without having to manually copy and paste your code and results into a separate document.
Overall, embedding code in a Markdown document can make your analysis more transparent, reproducible, and well-documented, which can improve the quality and credibility of your analysis.
This lesson provided an introduction. There are many more advanced markdown and R Notebook techniques, including writing books and building full websites. In fact, the :artificium lesson repository was full built in R Notebooks and the source for this lesson is an R Notebook.
Errata
None collected yet. Let us know.
