Introduction

While it may not be intuitive and unlike general-purpose object-oriented languages such as C++ and Java, we can, in fact, do object-oriented programming in R. For starters, everything in R is an object: vectors are objects, functions are objects, data frame are objects. However, this tutorial attempts to clarify how to create user-defined objects with attributes and methods.

In a generic sense, an object is an abstract data structure containing attributes, and defining methods which process the attributes. A class can be thought of as a blueprint for the objects, defining the structure and definition of the attributes. An object is an instance of a class. The process of creating a new object from a class definition is generally called instantiation.

While most programming languages have a single mechanism for defining classes, R has actually three class systems: S3, S4 and the more recent Reference class system. Each has their own features and peculiarities. Choosing which to use is mostly a matter of personal preference.

S3 Class System

An S3 class is simple but somewhat primitive in nature. It lacks a formal definition of the class and instances of this class are created by simply by adding a class attribute to a list object. This simplicity is one of the reasons that it is widely used by R programmers. In fact, most of the R built-in classes are S3 classes.

S4 Class System

The S4 class system is stricter in the sense that it has a formal way to define classes and a uniform way to instantiate objects. This makes the process safer, more like object-oriented languages, and prevents programmers from defining objects incorrectly.

Reference Class System

The Reference class system in R is similar to the object-oriented programming structures common languages like C++, Java, Python, etc. Unlike S3 and S4 classes, methods belong to a class rather than being definitions of pre-defined generic functions. Reference classes are internally implemented as S4 classes with an environment surrounding them.

The remainder of this lesson will focus on reference classes.

Defining a Reference Class

Defining a reference class is done with the setRefClass() function.

Member variables (attributes) of a reference class must be included as part of the class definition. Member variables of reference class are referred to as fields. Fields are called member variables in C++, attributes in Java, and slots in ontology definitions.

The code below defines a class Instructor with three fields.

Instructor <- setRefClass("Instructor", 
                          fields = list(iid="numeric", 
                                        name="character", 
                                        rank="character")
                          )

Instantiating a Reference Class

Instantiating an object means that we allocate a chunk of memory to hold all the fields of the object. There are two ways in which we can instantiate a new instance of a reference class. Using the name of the reference as shown below, or by using the new operator if we have a method initialize in our reference class definition – we will not show the latter approach just yet.

Instantiating an object means creating an instance of the class and allocating memory for its fields. So, the terminology instance, instance of a class, and object are used interchangeably.

Instantiation is done with the name of the class used an a generator function as demonstrated below.

i <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')

The code above creates an instance of the class “Instructor”, or, said another way, it allocates an object of type “Instructor”. Since the class “Instructor” has three fields (iid, name, and rank) we need to supply their default values. Instantiation means allocation of memory. In this case R allocates memory for a number (iid) and memory for each of the character strings. Upon completion of the memory allocation, we get back a reference to the object (i.e., a “pointer” or “link” to the block of memory where the object was allocated). We must keep track of that reference to be able to do something with the object or call any of its methods.

print(i)
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Jeff Alden"
## Field "rank":
## [1] "FT-Associate"

Accessing Fields

Similar to S3 classes, fields are accessed with the $ operator. They can also be modified that way. There is no notion of “private” fields or methods like there are in Java and C++. All members (fields and methods) are “public” in an R reference class object.

i <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')

# read a field's value
n <- i$name

# update a field's value
i$name <- 'Jeffrey Alden'

Objects are References

When instantiating a reference object, R generates an internal object in memory and returns a reference to the object (hence the name). So, assigning an object to another actually assigns the reference and does not make a copy. In the code below, i1 and i2 are references to the same object. This is similar to Java but unlike C++ when a copy constructor is defined.

i1 <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')
i2 <- i1

i2$name <- 'Xin Wang'

print(i1)
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Xin Wang"
## Field "rank":
## [1] "FT-Associate"

In the code above, we create a new instance of the class Instructor and get a reference back which we store in the variable i1. We then assign i1 to i2 – but we are actually assigning the reference (or a pointer to) the object. Think of i1 being the location in memory where the object is stored. Any modification of the memory through the reference i2 modifies the same object that is pointed to by i1. So, caution…

To make an actual copy, use the inherited method copy().

i1 <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')

i2 <- i1$copy()

# modifying i2 does not modify i1
i2$name <- 'Susan Wollaston'

print(i1)
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Jeff Alden"
## Field "rank":
## [1] "FT-Associate"

Defining Methods

All reference classes have a set of predefined methods inherited from the superclass envRefClass. This is similar to all Java classes being subclasses of the Object class.

New methods can be added inline in the separate list methods.

Notice the operator <<- used to access fields within a method. Using the simple assignment operator <- would have created a local variable called salary, which would lead to different behavior. Fortunately, R will issue a warning in such a case.

Also note the , after the } to separate the method function definitions.

Instructor <- setRefClass("Instructor", 
                          fields = list(iid="numeric", 
                                        name="character", 
                                        rank="character",
                                        salary="numeric"
                                        ),
                          methods = list(
                            getMonthlySalary = function() {
                              return (salary / 12)
                            },
                            
                            applyRaise = function(merit) {
                              salary <<- salary * (1 + merit)
                            }
                          ))

To make it clearer to the reader of our code when we access a field within a method and to avoid clashes when the name of a field is the same as the name of an argument to a method or a local variable, we can use .self which is a reference to the object on which the method is called. This is equivalent to the this pointer in Java and C++. The code below demonstrates this alternative.

Instructor <- setRefClass("Instructor", 
                          fields = list(iid="numeric", 
                                        name="character", 
                                        rank="character",
                                        salary="numeric"
                                        ),
                          methods = list(
                            getMonthlySalary = function() {
                              return (.self$salary / 12)
                            },
                            
                            applyRaise = function(merit) {
                              .self$salary <- .self$salary * (1 + merit)
                            }
                          ))

Here is what we mean by .self being a reference to the object on which the method is called. Consider the code fragment below where we instantiate two instances (objects) of “Instructor” and assign their references to two variables in this context: i and f. So, i is a reference to a block of memory that contains the fields {11, ‘Kaleb Ahmad’, ‘FT-Associate’} and f is a reference to a block of memory that contains the fields {476, ‘Leena Patel’, ‘PT’}. Remember that instantiation means allocation of memory for the object.

When we then call i$getMonthlySalary(), we call the method getMonthlySalary() on the object references by i and therefore inside of the function getMonthlySalary(), .self refers to the block of memory pointed at by i. So, .self$name would be ‘Kaleb Ahmad’. If we had called f$getMonthlySalary(), then .self$name would be ‘Leena Patel’ within getMonthlySalary(). So, .self within a method of an object is a always reference to the object on which the method is called.

The variable .self is automatically created and always initialized to be a reference to the object on which the method is called.

i <- Instructor(iid = 11, name = 'Kaleb Ahmad', 
                rank = 'FT-Associate', salary = 200000)
f <- Instructor(iid = 476, name = 'Leena Patel', 
                rank = 'PT', salary = 68000)

i$getMonthlySalary()
## [1] 16666.67

Accessing Methods

Methods are accessed the same way as fields – with the $ operator.

i <- Instructor(iid = 2, name = 'Dua Dipa', rank = 'T-Assistant', salary = 128000)

m.bef <- i$getMonthlySalary()
i$applyRaise(0.045)

m.aft <- i$getMonthlySalary()

cat("Salary raised from $", m.bef, "to $", m.aft, "per month")
## Salary raised from $ 10666.67 to $ 11146.67 per month

Inheritance

Inheritance is a key mechanism in object-oriented programming. It allows a programmer to define a new class (subclass or derived class) from an existing classes (superclass or base class). Derived classes can add new fields and methods. All fields and methods of the base class are automatically fields and methods of the derived class. This increases reusability of code and allows programmers to represent domain objects more accurately.

Inheritance is supported in all three class systems but is more like other object-oriented languages in the Reference class system. We will restrict ourselves to this class system.

In the example below, we have a base class Person with three fields and a method. We then define a derived class Instructor which extends Person with two additional fields and two methods by adding the base class Person name to the contains argument.

Person <- setRefClass("Person", 
                      fields = list(pid="numeric", 
                                    name="character",
                                    yob = "numeric"),
                      methods = list(
                        getAge = function() {
                          currYear <- as.numeric(format(Sys.time(), "%Y"))
                          return (currYear - yob)
                        }
                      ))

Instructor <- setRefClass("Instructor", 
                      contains = "Person",
                      fields = list(rank="character",
                                    salary="numeric"
                      ),
                      methods = list(
                        getMonthlySalary = function() {
                          return (salary / 12)
                        },
                        
                        applyRaise = function(merit) {
                          salary <<- salary * (1 + merit)
                        }
                      ))

We can then instantiate the derived class Instructor and find that it has all of the fields and methods of Person in addition to its additional fields and methods.

anInstructor <- Instructor(pid = 100, 
                           name = 'Raj Metha', 
                           rank = 'FT-Full',
                           yob = 1968,
                           salary = 182972)

anInstructor$getMonthlySalary()
## [1] 15247.67
anInstructor$getAge()
## [1] 56

Object Aggregation

In an aggregation relationship between objects, there is a whole/part or container/part hierarchy. In ontology terms, there is a partonomy. In an aggregation, one object “contains” other objects, although the containment does not have to be “physical”, i.e., the part objects do not have to be part of the same memory structure. The whole/part relationship can be by reference where the container object (whole or aggregate) contains references to the contained (part) objects.

Let’s implement the part hierarchy expressed by the UML Class Diagram below:

Member <- setRefClass("Member", fields = list(
  mID = "numeric", 
  name = "character",
  yearJoined = "numeric"))

Club <- setRefClass("Club", fields = list(
  name = "character",
  yearFounded = "numeric",
  maxMemID = "numeric",
  members = "list"),
                    
  methods = list(
    getNumMembers = function() {
      return (length(members))
    },
    
    addMember = function(m) {
      if (is.null(members))
          members <<- list(1024)
      
      # add a member ID for the new member
      m$mID <- maxMemID + 1
      maxMemID <<- maxMemID + 1
      
      # add the member to internal list
      members[[length(members)+1]] <<- m
      
      return (1)
    }
  ))

A few noteworthy points about the above code. The field members is a “private” member variable that keeps track of all of the members added to the club. It is an empty list when created, so right before the first member is added it must be allocated.

Now that we have the classes defined, let’s create some sample instances for testing. We won’t set a member ID for new members as those are assigned to them when they get added to the club.

# create a Club
aClub <- Club(name = 'DATA Club', 
              yearFounded = 2015,
              maxMemID = 0)

# create a few members and add them to the club
s <- aClub$addMember(
  Member(name = 'Jeff Garol', yearJoined = 2022))

s <- aClub$addMember(
  Member(name = 'Ursula Van Leiden', yearJoined = 2022))

s <- aClub$addMember(
  Member(name = 'Garrett Liew', yearJoined = 2022))

# number of club members should be correct
aClub$getNumMembers()
## [1] 3

Accessing Fields

Fields are instance variables; they have a value for each instance. For example, let’s initialize two instances of the class Club and let’s add them to a list so we have a way of keeping track of all the clubs – of course creating an aggregation class would be even better, perhaps calling that class Clubs. But, for now, we’ll just build a “free” list, in other words, a list that exists outside of any class:

# our list of clubs
clubs <- list(0)

# create club and add it to our list of clubs
clubs[[1]] <- Club(name = 'Volleyball Club', 
              yearFounded = 1997,
              maxMemID = 0)

# create club and add it to our list of clubs
clubs[[2]] <- Club(name = 'Tech Club', 
              yearFounded = 2018,
              maxMemID = 0)

# let's add a member to one of the clubs
clubs[[1]]$addMember(
  Member(name = 'Lesley Walter', yearJoined = 2020))
## [1] 1

Let’s inspect more closely what happens when we call a member function, i.e., when we call clubs[[1]]$addMember(Member(name = 'Lesley Walter', yearJoined = 2020)). The method addMember() is passed an instance of the class Member as an argument. To understand what occurs, let’s look at the code for that function by itself.

...

addMember = function(m) {
  if (is.null(members))
      members <<- list(1024)
  
  # add a member ID for the new member
  m$mID <- maxMemID + 1
  maxMemID <<- maxMemID + 1
  
  # add the member to internal list
  members[[length(members)+1]] <<- m
  
  return (1)
}

...

So, for the call clubs[[1]]$addMember(Member(name = 'Lesley Walter', yearJoined = 2020)), the object on which addMember() is called is clubs[[1]]. So, within the function addMember(), when referring to a field of the class Club, we refer to the values of those fields for the instance clubs[[1]]. To review, here is the code that created that instance of the club:

clubs[[1]] <- Club(name = 'Volleyball Club', 
              yearFounded = 1997,
              maxMemID = 0)

So, within addMember() for the call clubs[[1]]$addMember(Member(name = 'Lesley Walter', yearJoined = 2020)), maxMemberID would have the value 0 and yearFounded would be 1997. So, referring to those variables within addMember() would refer to those instance variables.

The .self Reference

.self is a pre-defined variable that refers to the object on which a method is called. So, if you called method M on an instance c of the class C having field X, then when calling c$M(), .self would be a reference to c. Notice the dot prefix. To refer to the field X of the instance on which you are calling a method, would be .self$X within a method. This is useful if you need to access a field that is “hidden” because you either have a parameter to the method M that called X or a local variable with M that is called X. Alternatively, programmers often use .self$X when referring to the field X to make it clear to the reader of the code that they intend to access a field rather than a local variable or an argument – it adds to code clarity.

We could therefore rewrite the code for addMember() using .self as follows:

...

addMember = function(m) {
  if (is.null(.self$members))
      .self$members <<- list(1024)
  
  # add a member ID for the new member
  m$mID <- .self$maxMemID + 1
  maxMemID <<- .self$maxMemID + 1
  
  # add the member to internal list
  .self$members[[length(.self$members)+1]] <- m
  
  return (1)
}

...

Instantiation with new

This section presents an alternative way to instantiate and initialize a reference class object. It is more like those mechanisms found in object-oriented languages like Java.

Initialization refers to the process of setting up an object when it is created. In the reference class system in R, this is done by defining an initialize method for a class. The initialize method is called automatically when a new object of the class is created, and it takes care of setting up the object’s internal state. In Java and C++, this function is referred to as the constructor.

For example, you might use the initialize method to set the initial values or attributes, load an object’s state from an external file or a database, or perform any other kind of initialization.

The example below adds an initialize method to our previous class Member and shows how that method is automatically invoked. Note that we now need to call the implicitly defined function new to instantiate an object.

Member <- setRefClass("Member", 
                      fields = list(
                        mID = "numeric", 
                        name = "character",
                        yearJoined = "numeric"),
                      
                      methods = list(
                        initialize = function(name, year) {
                          .self$name <- name
                          .self$yearJoined <- year
                        }
                      ))


# create an instance with implicit initialization
aMember <- Member$new('Liz Chao', 2023)

In this example, the initialize method takes two arguments, name and year, which are used to initialize the name and yearJoined fields of the object, respectively. When a new object of the class is created with new, the initialize() method is called automatically and the fields are initialized accordingly.

Tutorial I: Classes, Objects, and Instantiation

After having read the lesson above, watch the tutorial and revisit the various sections of the lesson and try the code yourself.

Container Objects

A container object, also known as a collection object, is a type of object that holds a collection of other objects.

In object-oriented programming, a container object is an object that is used to store and manage other objects. The idea behind a container object is to provide a convenient and efficient way of grouping and organizing related objects.

They are necessary for storing instances of classes as there are no “natural” containers. In the previous example, a Club object acted as a container for all Member objects. But what if we had more than one Club object? Who would keep track of all of those objects? Naturally, we could use a vector to store them – or, we could build a container class and create an instance of that class as a container object. The class would then have the usual methods of adding an object, removing an object, counting the objects, and finding objects based on different criteria. Some containers also provide iterators to iterate over the elements stored in the container.

Using container objects can be beneficial in several ways:

  • Abstraction: By using a container object, you can abstract away the details of how the elements are stored and manipulated, making your code more readable and easier to maintain.

  • Encapsulation: Container objects encapsulate the elements they contain, hiding their implementation details and making it easier to change the underlying implementation without affecting the rest of the code.

  • Reusability: Container objects can be used as building blocks in larger systems, allowing for code reuse and reducing duplication.

  • Performance: Container objects can often provide optimized implementations for common operations, such as adding or removing elements, making them more efficient than using basic data structures like vectors or lists.

Overall, container objects are a key aspect of object-oriented programming and can help to simplify and optimize the development of complex systems. They are necessary in all object-oriented programming languages, including Java and C++, and not just R.

Tutorial II: Object Aggregation & Container Objects

Conclusion

Object-orientation is a common way to create abstraction and structure complex information. While R is not a fully object-oriented language, many of the information abstraction mechanisms provided by classes, objects, and methods are supported by R, albeit in a way that may be unfamiliar to programmers coming to R from C++, Java, or similar languages. Unlike other languages, R has three distinct ways in which to define classes and objects, with the Reference Classes being the most similar to other object-oriented languages.


Files & Resources

All Files for Lesson 6.122

Errata

Let us know.

