Introduction
Programming is not easy and developing code is never a linear path without problems, rather it is a highway littered with frustration, anger, and eventual joy. Aside from proper doing analysis, planning carefully, leveraging and applying program and design patterns, and using sound programming practices, mistakes still occur and “bugs” must be located. Debugging is thus essential for any programming effort, including programming in R.
This lesson introduces common strategies for helping locate errors in R programs. Most of the practices shown apply only to actual “programs”, i.e., R scripts, and may not work well in R Markdown, R Notebooks, or shiny web apps. However, code can be copied from an R Notebook to an R Script for debugging purposes. The reason why R Markdown is not easily debugged is that the code is run during knitting where output and code is intercepted by the knitr package code.
Common Strategies
Debugging
This section presents a series of common strategies for dealing with bugs and errors.
Google
Whenever you encounter an error message, start by searching for the message online. Chances are that someone else has encountered this error before and there’s a solution posted. When “googling”, improve the likelihood of a good match by removing any variable names or values that are specific to your code and enclose your your error message in quotes when you use it as a search term.
Of course, be sure to acknowledge the solution in your code, post your own solutions, and contribute to knowledge basis – at least by “liking” an answer if it was helpful or explaining why an answer does not work or did not work for you.
Isolate Defective Code
Finding an error in a large program is nearly impossible. Start by extracting (into a separate source file) the code that does not work and reduce any data structure or files to a small sample that allows you to reproduce the error.
Debug in Sandbox
Do your debugging in separate code files. Do not debug in your production code if at all possible. Experiment in that sandbox.
Add Debug Statements
One of the most common debugging tricks is to insert various forms of print statements into the code to inform the programmer where the program is executing and what values of variables are. R provides numerous mechanisms for inserting debug statements into R code.
For more information on this approach, see the section on Debugging R Scripts below.
Use Interactive Debugger
An interactive debugger is often very helpful in tracking down errors. It allows a programmer to set breakpoints, inspect the call stack, and see the contents of variables.
For more information on how to use the interactive debugger in R Studio, see Lesson 6.195 Using the Debugger in R Studio.
Debugging R Scripts
There are a several common approaches for debugging your own code. Of course, carefully reading the error message and line number that R provides is often a first clue. Naturally, the error could be on the line prior so be sure to look there first.
In R Studio, use the Environment window tab to watch the value of variables as code executes.
Display diagnostic information, variable contents, run counters a combination of calls to print()
, cat()
or message()
.
Call traceback()
to get a stack trace to see where a reported error is occurring.
Call browser()
to open an interactive debugger window
Call debug()
to automatically open a debugger at the start of a function call
Call trace()
to set a breakpoint by opening a debugger at a location inside a function.
Diagnostic Messages
Below is a short piece of code that reads a CSV of cereals, cereal content, and name of manufacturer and then counts how many cereals each manufacturer produces. We will modify the code to demonstrate various debugging practices.
df <- read.csv(file = "CerealDataCSV.csv",
header = T,
stringsAsFactors = F)
# total number of cereals
n <- nrow(df)
# new data frame holding manufacturers and count
df.manu <- data.frame(Manufacturer = vector(mode = "character", 0),
NumCereals = vector(mode = "numeric", 0))
for (i in 1:n)
{
# next manufacturer in the data
m <- df$Manufacturer[i]
# is it already in the list of manufacturers?
f <- which(df.manu$Manufacturer == m)
if (length(f) == 0) {
# not yet -- so add manufacturer with count of 1
newRow <- nrow(df.manu)+1
df.manu[newRow,2] <- 1
df.manu[newRow,1] <- m
} else {
# yes -- bump the count
df.manu[f,2] <- df.manu[f,2] +1
}
}
print(head(df.manu, 3))
## Manufacturer NumCereals
## 1 Nabisco 6
## 2 Kellogg 21
## 3 Ralston Purina 6
If you code does not behave as expected, a common tactic is to add print()
or cat()
messages in your code and print out diagnostic messages that include values of variables and objects. Use str()
to print more details about an object’s structure.
One main drawback of this approach is that you cannot add print()
to functions or loops. In addition, if added to a long loop, the messages can be overwhelming. Finally, diagnostic messages take a relatively long time to render so code will run slower, particularly long loops. In that case, reduce the number of iterations of a loop or use a subset of a dataframe or file.
df <- read.csv(file = "CerealDataCSV.csv",
header = T,
stringsAsFactors = F)
# total number of cereals
n <- nrow(df)
# new data frame holding manufacturers and count
df.manu <- data.frame(Manufacturer = vector(mode = "character", 0),
NumCereals = vector(mode = "numeric", 0))
cat("Size of df.manu: ", str(df.manu), "\n")
## 'data.frame': 0 obs. of 2 variables:
## $ Manufacturer: chr
## $ NumCereals : num
## Size of df.manu:
for (i in 1:3)
{
# next manufacturer in the data
m <- df$Manufacturer[i]
# is it already in the list of manufacturers?
f <- which(df.manu$Manufacturer == m)
if (length(f) == 0) {
# not yet -- so add manufacturer with count of 1
cat("new manu: ", m, " ")
newRow <- nrow(df.manu)+1
df.manu[newRow,1] <- 1
df.manu[newRow,2] <- m
cat(newRow, "/", df.manu[newRow,2] <- m, "\n")
} else {
# yes -- bump the count
df.manu[f,2] <- df.manu[f,2] +1
}
}
## new manu: Nabisco 1 / Nabisco
## new manu: Kellogg 2 / Kellogg
## new manu: Kellogg 3 / Kellogg
Note: The “” in cat()
adds a newline, i.e., the next output appears on a new line.
traceback()
The traceback()
function prints the lines of code that were executed just prior to arriving at the error. Programmers often refer to this as a call stack, stack trace or backtrace. When code stops running, programmers often say that is “crashed”, “barfed”, or “dumped core”. The latter harks back to the days when computers had “core” memory made up of vacuum tubes.
You can either directly call traceback()
immediately after the error has occurred, from the console or by inserting the call into your code when you run it again.
df <- read.csv(file = "CerealDataCSV.csv",
header = T,
stringsAsFactors = F)
# total number of cereals
n <- nrow(df)
# new data frame holding manufacturers and count
df.manu <- data.frame(Manufacturer = vector(mode = "character", 0),
NumCereals = vector(mode = "numeric", 0))
for (i in 1:3)
{
# next manufacturer in the data
m <- df$Manufacturer[i]
# is it already in the list of manufacturers?
f <- which(df.manu$Manufacturer == m)
if (length(f) == 0) {
# not yet -- so add manufacturer with count of 1
newRow <- nrow(df.manu)+1
# this next line will cause an error
df.manu[newRow] <- 1
df.manu[newRow] <- m
} else {
# yes -- bump the count
df.manu[f,2] <- df.manu[f,2] +1
}
}
When running code chunks in an R Notebook, the cursor will remain on the line where the error occurred giving you a clue as to where to insert traceback()
.
You can add an automatic call to traceback()
so that you’ll see the stack trace after every error. In R Studio you will need to click on the “Show Traceback” link in the error message.
options(error = traceback)
browser()
trace()
Errata
None collected yet. Let us know.
