Introduction
Taking measurements of execution (run-time) of an R function or code chunk in R is often necessary when profiling the code to determine bottlenecks and to compare different implementations. It can also be useful to ascertain the run-time complexity of package functions. There are several packages for benchmarking R code as well as two built-in Base R functions for measuring time. This lesson will look at the most commonly used options.
Using “Sys.time”
The function Sys.time()
is part of Base R, so no additional packages are required. It returns the current time. Calling the function before and after some code that is to be measured and then calculating the difference in time elapsed provides a simple way to measure the run-time of a chunk of code. This is one of the simplest and most flexible approaches and generally sufficient. The function returns a “difftime” object which must be converted to a numeric type for any further computations.
The example below illustrates that approach. It loads a CSV and counts the number of female customers using a loop.
bt <- Sys.time()
df <- read.csv(file = "customertxndata.csv", header = T)
femCounter <- 0
for (i in 1:nrow(df))
{
if (!is.na(df$Gender[i]) && df$Gender[i] == "Female")
femCounter = femCounter + 1
}
et <- Sys.time()
t.loop <- et - bt
cat("Time elapsed: ", round((t.loop),3), " sec")
Time elapsed: 0.321 sec
The code below also counts the number of female customers but uses which()
instead of a loop in order to measure the time difference between the two approaches.
bt <- Sys.time()
df <- read.csv(file = "customertxndata.csv", header = T)
femCounter <- 0
n <- which(df$Gender == "Female")
femCounter <- length(n)
et <- Sys.time()
t.which <- et - bt
cat("Time elapsed: ", round((t.which),3), " sec")
Time elapsed: 0.111 sec
So, based on profiling the code, a linear search of a data frame with 91200 rows using a loop takes 0.321 seconds while using which()
is 0.21 seconds faster.
Using “system.time”
The function takes system.time()
takes an R expression as an argument and then returns a report on the time spent executing the expression. It reports the time spent running the actual code (user), the time spent calling operating system functions (system), and the total time elapsed which includes any time that the process is blocked.
The code below illustrates the use of this function.
processDF <- function ()
{
df <- read.csv(file = "customertxndata.csv", header = T)
femCounter <- 0
n <- which(df$Gender == "Female")
femCounter <- length(n)
}
system.time(processDF())
user system elapsed
0.103 0.001 0.105
Generally, the most useful reported time is elapsed time as it is the entire time needed to run the expression. Note that the function also uses some time to run, so the time reported here will be a little bit longer than if you measured it using another method – but that is rarely of any practical consequence.
Other Methods
The benchmark package contains functions rbenchmark()
while the package microbenchmark has the function microbenchmark()
which take expressions as parameters and report the run-time of each making code comparisons simpler. The microbenchmark package also contains visualizations for benchmarks built using gplot2.
Summary
The most common and simplest approach to measuring the execution time of an R expression or a code chunk is to use either Sys.time()
or system.time()
.
Errata
None collected yet. Let us know.
---
title: "Measure Run-Time Performance of R Code"
params:
  category: 6
  number: 134
  time: 45
  level: beginner
  tags: "R,debugging,runtime,time,Sys.time,tictoc,rbenchmark"
  description: "Demonstrates how to measure the execution time of R
                code for profiling, debugging, and performance
                improvemenet."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: spacelab
    highlight: tango
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

## Introduction

Taking measurements of execution (run-time) of an R function or code chunk in R is often necessary when profiling the code to determine bottlenecks and to compare different implementations. It can also be useful to ascertain the run-time complexity of package functions. There are several packages for benchmarking R code as well as two built-in Base R functions for measuring time. This lesson will look at the most commonly used options.

## Using "Sys.time"

The function <code>Sys.time()</code> is part of Base R, so no additional packages are required. It returns the current time. Calling the function before and after some code that is to be measured and then calculating the difference in time elapsed provides a simple way to measure the run-time of a chunk of code. This is one of the simplest and most flexible approaches and generally sufficient. The function returns a "difftime" object which must be converted to a numeric type for any further computations.

The example below illustrates that approach. It loads a CSV and counts the number of female customers using a loop.

```{r usingSys.time1, eval=T, echo=T, comment=""}

bt <- Sys.time()

df <- read.csv(file = "customertxndata.csv", header = T)

femCounter <- 0

for (i in 1:nrow(df))
{
  if (!is.na(df$Gender[i]) && df$Gender[i] == "Female")
    femCounter = femCounter + 1
}

et <- Sys.time()

t.loop <- et - bt

cat("Time elapsed: ", round((t.loop),3), " sec")
```

The code below also counts the number of female customers but uses <code>which()</code> instead of a loop in order to measure the time difference between the two approaches.

```{r usingSys.time2, eval=T, echo=T, comment=""}

bt <- Sys.time()

df <- read.csv(file = "customertxndata.csv", header = T)

femCounter <- 0

n <- which(df$Gender == "Female")
femCounter <- length(n)

et <- Sys.time()

t.which <- et - bt

cat("Time elapsed: ", round((t.which),3), " sec")
```

So, based on profiling the code, a linear search of a data frame with `r nrow(df)` rows using a loop takes `r round(t.loop,3)` seconds while using <code>which()</code> is `r round(t.loop - t.which, 3)` seconds faster.

## Using "system.time"

The function takes <code>system.time()</code> takes an R expression as an argument and then returns a report on the time spent executing the expression. It reports the time spent running the actual code (user), the time spent calling operating system functions (system), and the total time elapsed which includes any time that the process is blocked.

The code below illustrates the use of this function.

```{r using.system.time, eval=T, echo=T, comment=""}

processDF <- function ()
{
  df <- read.csv(file = "customertxndata.csv", header = T)
  
  femCounter <- 0
  
  n <- which(df$Gender == "Female")
  femCounter <- length(n)
}

system.time(processDF())
```

Generally, the most useful reported time is *elapsed* time as it is the entire time needed to run the expression. Note that the function also uses some time to run, so the time reported here will be a little bit longer than if you measured it using another method -- but that is rarely of any practical consequence.

## Other Methods

The **benchmark** package contains functions <code>rbenchmark()</code> while the package **microbenchmark** has the function <code>microbenchmark()</code> which take expressions as parameters and report the run-time of each making code comparisons simpler. The **microbenchmark** package also contains visualizations for benchmarks built using **gplot2**.

## Summary

The most common and simplest approach to measuring the execution time of an R expression or a code chunk is to use either <code>Sys.time()</code> or <code>system.time()</code>.

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## References

[5 ways to measure running time of R code](https://www.alexejgossmann.com/benchmarking_r/#:~:text=%205%20ways%20to%20measure%20running%20time%20of,of%20an%20R%20expression%20using%20system.time.%20More%20)

## Errata

None collected yet. Let us know.

```{=html}
<script src="https://form.jotform.com/static/feedback2.js" type="text/javascript">
  new JotformFeedback({
    formId: "212187072784157",
    buttonText: "Feedback",
    base: "https://form.jotform.com/",
    background: "#F59202",
    fontColor: "#FFFFFF",
    buttonSide: "left",
    buttonAlign: "center",
    type: false,
    width: 700,
    height: 500,
    isCardForm: false
  });
</script>
```
```{r code=xfun::read_utf8(paste0(here::here(),'/R/_deployKnit.R')), include = FALSE}
```
