Introduction
While R support object-oriented programming structures, it is not an object-oriented languages like Java or C#. However, when writing large R programs that process structured information objects, it is convenient to represent objects from the domain as objects in R. Being able to persist R objects is a natural extension. By persistence we mean that R objects can be externalized and reconstructed the next time the program runs.
In this tutorial we will look at how to build an object persistence mechanism using XML as the external object format. Lesson 6.124 Externalizing R Objects to SQL demonstrates how to store objects in a relational SQL database. Of course, objects could also be externalized by writing them in the RData format as binary objects.
We will start with a review of the Reference Class System for object-oriented programming and introduce a simple set of R classes representing a subset of some domain ontology.
Reference Class System
The Reference class system in R is similar to the object-oriented programming structures common languages like C++, Java, Python, etc.
Defining a Reference Class
Defining a reference class is done with the setRefClass()
function.
Member variables (attributes) of a reference class must be included as part of the class definition. Member variables of reference class are referred to as fields in R.
The code below defines a class Instructor with three fields: iid (number), name (string/text), and rank (string/text).
Instructor <- setRefClass("Instructor",
fields = list(iid="numeric",
name="character",
rank="character")
)
Instantiating a Reference Class
Instantiation is done with the function setRefClass()
, which returns a generator function that is then used to create objects of that class. In the example below, i is a reference to an instance of the class Instructor.
i <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')
Printing an object can be done with print()
. By default, print()
lists the name of the class and the values of the fields. Note that print()
can also be overridden.
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Jeff Alden"
## Field "rank":
## [1] "FT-Associate"
Accessing Fields
Fields are accessed (and modified) with the $ operator. In the example below, i is an instance of the class Instructor (or, to day it another way, i is an object of type Instructor). To access the value of the name field of the instance i, use i$name
. It can be used as either an rvalue or an lvalue. If used as an lvalue, then the field can be updated. R does not support private or protected fields as is common in Java or C++; a fields are public.
i <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')
# read a field's value
n <- i$name
# update a field's value
i$name <- 'Jeffrey Alden'
Objects are References
When instantiating a reference object, R generates an internal object and returns a reference to the object (hence the name). So, assigning an object to another variable actually assigns the reference and does not make a copy. In the code below, i1 and i2 are references to the same Instructor object. This is similar to Java but unlike C++ when a copy constructor is defined.
i1 <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')
i2 <- i1
i2$name <- 'Xin Wang'
print(i1)
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Xin Wang"
## Field "rank":
## [1] "FT-Associate"
In the code above, we create a new instance of the class Instructor and get a reference back which we store in the variable i1. We then assign i1 to i2 – but we are actually assigning the reference (or a pointer) to the object. Think of i1 being the location in memory where the object is stored – or think of i1 as the ID of the object. Any modification of the memory through the reference i2 modifies the same object that is pointed to by i1. So, be careful when “copying” objects.
Copying Objects
To make an actual copy, use the inherited method copy()
. This creates a new object that is equivalent of the original object.
i1 <- Instructor(iid = 1, name = 'Jeff Alden', rank = 'FT-Associate')
i2 <- i1$copy()
# modifying i2 does not modify i1
i2$name <- 'Susan Wollaston'
print(i1)
## Reference class object of class "Instructor"
## Field "iid":
## [1] 1
## Field "name":
## [1] "Jeff Alden"
## Field "rank":
## [1] "FT-Associate"
Defining Methods
Methods are defined for a reference class and do not belong to generic functions as in S3 and S4 classes. All reference classes have a set of predefined methods inherited from the superclass envRefClass. This is similar to all Java classes being subclasses of the Object class.
New methods can be added inline in the separate list methods.
Notice the operator <<- used to access fields within a method. Using the simple assignment operator <- would have created a local variable called salary, which would lead to different behavior. Fortunately, R will issue a warning in such a case.
Also note the , after the } to separate the method function definitions.
Instructor <- setRefClass("Instructor",
fields = list(iid="numeric",
name="character",
rank="character",
salary="numeric"
),
methods = list(
getMonthlySalary = function() {
return (salary / 12)
},
applyRaise = function(merit) {
salary <<- salary * (1 + merit)
}
))
Accessing Methods
Methods are accessed the same way as fields – with the $ operator.
i <- Instructor(iid = 2,
name = 'Dua Dipa',
rank = 'T-Assistant',
salary = 128000)
m.bef <- i$getMonthlySalary()
i$applyRaise(0.045)
m.aft <- i$getMonthlySalary()
cat("Salary raised from $", m.bef, "to $", m.aft, "per month")
## Salary raised from $ 10666.67 to $ 11146.67 per month
Inheritance
Inheritance is a key mechanism in object-oriented programming. It allows a programmer to define a new class (subclass or derived class) from an existing classes (superclass or base class). Derived classes can add new fields and methods. All fields and methods of the base class are automatically fields and methods of the derived class. This increases reusability of code and allows programmers to represent domain objects more accurately.
Inheritance is supported in all three class systems but is more like other object-oriented languages in the Reference class system. We will restrict ourselves to this class system.
In the example below, we have a base class Person with three fields and a method. We then define a derived class Instructor which extends Person with two additional fields and two methods by adding the base class Person name to the contains argument.
Person <- setRefClass("Person",
fields = list(pid="numeric",
name="character",
yob = "numeric"),
methods = list(
getAge = function() {
currYear <- as.numeric(format(Sys.time(), "%Y"))
return (currYear - yob)
}
))
Instructor <- setRefClass("Instructor",
contains = "Person",
fields = list(rank="character",
salary="numeric"
),
methods = list(
getMonthlySalary = function() {
return (salary / 12)
},
applyRaise = function(merit) {
salary <<- salary * (1 + merit)
}
))
We can then instantiate the derived class Instructor and find that it has all of the fields and methods of Person in addition to its additional fields and methods.
anInstructor <- Instructor(pid = 100,
name = 'Raj Metha',
rank = 'FT-Full',
yob = 1968,
salary = 182972)
anInstructor$getMonthlySalary()
## [1] 15247.67
## [1] 56
Object Aggregation
In an aggregation relationship between objects, there is a whole/part or container/part hierarchy. In ontology terms, there is a partonomy. In an aggregation, one object “contains” other objects, although the containment does not have to be “physical”, i.e., the part objects do not have to be part of the same memory structure. The whole/part relationship can be by reference where the container object (whole or aggregate) contains references to the contained (part) objects.
Let’s implement the part hierarchy expressed by the UML Class Diagram below:
Member <- setRefClass("Member", fields = list(
mID = "numeric",
name = "character",
yearJoined = "numeric"))
Club <- setRefClass("Club",
fields = list(
name = "character",
yearFounded = "numeric",
maxMemID = "numeric",
members = "list"),
methods = list(
getNumMembers = function() {
return (length(members))
},
addMember = function(m) {
if (is.null(members))
members <<- list(1024)
# add a member ID for the new member
m$mID <- maxMemID + 1
maxMemID <<- maxMemID + 1
# add the member to internal list
members[[length(members)+1]] <<- m
return (1)
}
))
A few noteworthy points about the above code. The field members is a “private” member variable that keeps track of all of the members added to the club. It is an empty list when created, so right before the first member is added it must be allocated.
Now that we have the classes defined, let’s create some sample instances for testing. We won’t set a member ID for new members as those are assigned to them when they get added to the club.
# create a Club
aClub <- Club(name = 'DATA Club',
yearFounded = 2015,
maxMemID = 0)
# create a few members and add them to the club
s <- aClub$addMember(
Member(name = 'Jeff Garol', yearJoined = 2022))
s <- aClub$addMember(
Member(name = 'Ursula Van Leiden', yearJoined = 2022))
s <- aClub$addMember(
Member(name = 'Garrett Liew', yearJoined = 2022))
# number of club members should be correct
aClub$getNumMembers()
## [1] 3
Building XML Documents
Now that we have an understanding of the mechanisms to build R objects, we need to turn our attention to the mechanisms for constructing XML documents from withing R
The functions used to build an in-memory XML document are from the XML package That same library is also used to parse XML elements in an XML document.
Using XML Package
Let’s start with a simple example that externalizes a data frame as XML. It will show us how to use the functions to generate an in-memory XML DOM which can then save to a file.
library(XML)
# Data in 3 columns in a data frame
df <- data.frame(refID = c(100, 200),
upc = c('20190818',
'20190823'),
desc = c('eReader 8',
'USB-C Cable')
)
Now that we have a data frame, let’s externalize the data frame to some XML structure.
# build XML structure
XMLdoc = newXMLDoc()
# root node is <catalog>
rootNode = newXMLNode("catalog", doc = XMLdoc)
# add elements to the XML underneath the root node
mvNode = newXMLNode("catVersion", "1.0.0", parent = rootNode)
# write each of the rows in the data frame
for (i in 1:nrow(df)){
# add a node with an attribute
prodNode = newXMLNode("product",
attrs = c(refID = df$refID[i]),
parent = rootNode)
# add details for each product as child nodes
newXMLNode("upc", df$upc[i], parent = prodNode)
newXMLNode("desc", df$desc[i], parent = prodNode)
}
# add an empty "flag" node
vwNode = newXMLNode("locked", parent = rootNode)
# save XML to a file
saveXML(XMLdoc, file = "prod-catalog.xml")
## [1] "prod-catalog.xml"
Using String Concatenation
A more flexible, often faster, but more error-prone and laborious process is to construct the XML document from concatenated character strings. After all, an XML document is a plain text document.
The code below constructs the same XML document as in the prior section. It continually adds to a character variable using the function paste0()
– the string concatenation function in R.
Note that the \n characters insert newline feeds into the result document for a cleaner look when viewed; they are not strictly necessary from an XML syntactic point of view. Also note the way that quotes are added within quotes. We use single quotes within the XML and double quotes to enclose strings in R – we could have done it the other way around as well; both R and XML accept single and double quotes for string enclosure. The strings are include leading spaces to, again, make the XML more “viewable”.
# start with the preamble and root tag
xml <- "<?xml version='1.0'?>\n\n<catalog>"
# add child element
xml <- paste0(xml, " <catVersion>", "1.0.0", "</catVersion>\n")
# write each of the rows in the data frame
for (i in 1:nrow(df)){
# add a node with an attribute
xml <- paste0(xml, " ",
"<product refID='", df$refID[i], "'>\n")
# add details for each product as child nodes
xml <- paste0(xml, " ",
"<upc>", df$upc[i], "</upc>\n")
xml <- paste0(xml, " ",
"<desc>", df$desc[i], "</desc>\n")
# terminate <product> element
xml <- paste0(xml, " ", "</product>\n")
}
# add an empty "flag" node
xml <- paste0(xml, " <locked />\n")
# terminate the root tag <catalog>
xml <- paste0(xml, "</catalog>")
# save the XML to a file
f <- file("prod-catalog-v2.xml")
writeLines(xml, f)
close(f)
Externalizing an Object
Now that we understand how to generate a DOM, we can externalize an object’s field values. A common approach is to add a method that takes care of the externalization of a class.
Let’s try this by adding a new method to the above class Member called ext2XML() which takes the pre-created XML DOM to which the object should be added as an input argument.
Member$methods(ext2XML = function(parentNode) {
n = newXMLNode("member",
.self$mID,
parent = parentNode)
newXMLNode("name", .self$name,
parent = n)
newXMLNode("year-joined", .self$yearJoined,
parent = n)
return(TRUE)
})
We’ll start by creating a new XML document and then we will call the newly created method ext2XML() on one of the already created instances of the class Member and then display the DOM.
library(XML)
# create new (empty) XML document
xml = newXMLDoc()
# add <members> as the root node
rootNode = newXMLNode("members", doc = xml)
# externalize a member object to the XML
aMember <- Member(name = 'Ozzy Osbourne', yearJoined = 1971)
isSuccess <- aMember$ext2XML(rootNode)
# inspect the XML
print(xml)
## <?xml version="1.0"?>
## <members>
## <member>
## <name>Ozzy Osbourne</name>
## <year-joined>1971</year-joined>
## </member>
## </members>
##
Now that we know how to externalize a single object, we can externalize a container which would then externalize its member elements. Note the ext2xml()
functions in each class. The code below demonstrates this:
Member <- setRefClass("Member",
fields = list(
mID = "numeric",
name = "character",
yearJoined = "numeric"),
methods = list(
ext2xml = function () {
memNode <- newXMLNode("member", parent = rootNode)
newXMLNode("name", .self$name, parent = memNode)
newXMLNode("yearJoined", .self$yearJoined, parent = memNode)
return (memNode)
}
))
Club <- setRefClass("Club",
fields = list(
name = "character",
yearFounded = "numeric",
maxMemID = "numeric",
members = "list"),
methods = list(
getNumMembers = function() {
return (length(members))
},
addMember = function(m) {
if (is.null(members))
members <<- list(1024)
# add a member ID for the new member
m$mID <- maxMemID + 1
maxMemID <<- maxMemID + 1
# add the member to internal list
members[[length(members)+1]] <<- m
return (1)
},
ext2xml = function () {
extXML <- newXMLDoc()
rootNode = newXMLNode("club",
attrs = c(name = .self$name),
doc = extXML)
newXMLNode("yearFounded",
.self$yearFounded, parent = rootNode)
for (m in 1:length(members)) {
aMember <- .self$members[[m]]
addChildren(rootNode, aMember$ext2xml()) }
return (extXML)
}
))
# create a Club
aClub <- Club(name = 'DATA Club',
yearFounded = 2015,
maxMemID = 0)
# create a few members and add them to the club
s <- aClub$addMember(
Member(name = 'Jeff Garol', yearJoined = 2022))
s <- aClub$addMember(
Member(name = 'Ursula Van Leiden', yearJoined = 2022))
s <- aClub$addMember(
Member(name = 'Garrett Liew', yearJoined = 2022))
# number of club members should be correct
aClub$getNumMembers()
## [1] 3
# externalize the Club and its Members to XML
xmlClubDoc <- aClub$ext2xml()
# save XML to a file
saveXML(xmlClubDoc, file = "club.xml")
## [1] "club.xml"
The “club.xml” file looks like this:
<?xml version="1.0"?>
<club name="DATA Club">
<yearFounded>2015</yearFounded>
<member>
<name>Jeff Garol</name>
<yearJoined>2022</yearJoined>
</member>
<member>
<name>Ursula Van Leiden</name>
<yearJoined>2022</yearJoined>
</member>
<member>
<name>Garrett Liew</name>
<yearJoined>2022</yearJoined>
</member>
</club>
