list.of.packages <- c("r2r")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos='http://cran.us.r-project.org', verbose = F, quiet = T)

Introduction

A hash map (also often called a hash table or associative array), is a common key-value data structure. While most programming languages provide some implementation of this vital data structure, there is no equivalent in R due to the way R manages memory and doesn’t allow dynamic data structures such as linked lists. There are some options, nevertheless, which we will demonstrate in this lesson.

The principal reason for using a hash map is that looking up a value based on its key is constant time (\(O(1)\)) regardless of the number of elements in the map. In contrast, looking up a value in an array or list of \(O(n)\), i.e., it increases linearly with the number of elements in the structure.

Key/Value Pairs

A key/value pair is a key plus some value that is stored for that key. For example, a table in a relational database can be considered a set of of key/value pairs where the key is the “primary key” of the table and the value is the set of non-prime attributes. Another example is a journal identified by its ISSN and the value would be the journal’s title.

For the experiments in the remainder of the lesson we will need some sample key/value pairs. The code below generates a set of those with unique keys with corresponding values.

n <- 10
keys.10 <- as.character(sample(100001:999999, n))
vals.10 <- paste0("value-", keys.10)

n <- 100
keys.100 <- as.character(sample(100001:999999, n))
vals.100 <- paste0("value-", keys.100)

n <- 1000
keys.1000 <- as.character(sample(100001:999999, n))
vals.1000 <- paste0("value-", keys.1000)

n <- 10000
keys.10000 <- as.character(sample(100001:999999, n))
vals.10000 <- paste0("value-", keys.10000)

n <- 100000
keys.100000 <- as.character(sample(100001:999999, n))
vals.100000 <- paste0("value-", keys.100000)

n <- 1000000
keys.1000000 <- as.character(sample(1000001:9999999, n))
vals.1000000 <- paste0("value-", keys.1000000)

Linear Lookup

While R does support named vectors and lists and allows for associative access, the performance is \(O(n)\) and does not scale. Essentially, any lookup is a linear search through the vector or list.

lookup.linear <- function (x, keys, vals)
{
  s <- 1:(length(m))
  
  for (i in s) {
    if (keys[i] == x)
      return (vals[i])
  }
  
  return (NA)
}

Let’s add some data to this data structure so we can perform some timings on lookup performance. Keys have to be unique so we need to generate some number of unique key values.

Named Vectors

Hash Maps

Operations on Hash Tables

There are three main operations that must be implemented for a functioning hash map data structure. Given a key (as a string) and a value associated with that key:

  • INSERT will add the key/value pair to the hash map
  • LOOKUP will search the hash map for a key and return the associated value if it exists
  • DELETE will remove a value based on a key

There are other useful operations, such as EXISTS that returns true if the key exists and false otherwise.

Package r2r

library(r2r)

hm <- hashmap()

Summary


Files & Resources

All Files for Lesson 6.355

Errata

Let us know.

LS0tCnRpdGxlOiAiU3BlZWRpbmcgdXAgTG9va3VwIHdpdGggSGFzaCBNYXBzIGluIFIiCnBhcmFtczoKICBjYXRlZ29yeTogNgogIG51bWJlcjogMzU1CiAgdGltZTogMzAKICBsZXZlbDogYmVnaW5uZXIKICB0YWdzOiAicixoYXNobWFwcyxlbnYiCiAgZGVzY3JpcHRpb246ICJIYXNoIG1hcHMgKGhhc2ggdGFibGVzKSBhcmUgYW4gaW1wb3J0YW50IGRhdGEgc3RydWN0dXJlCiAgICAgICAgICAgICAgICBmb3Igc3RvcmluZyBrZXkvdmFsdWUgcGFpcnMgYW5kIGJlaW5nIGFibGUgdG8KICAgICAgICAgICAgICAgIHJldHJpZXZlIHRoZW0gYmFzZWQgb24gYSBrZXkgaW4gY29uc3RhbnQgdGltZQogICAgICAgICAgICAgICAgcmVnYXJkbGVzcyBvZiB0aGUgbnVtYmVyIG9mIGtleS92YWx1ZSBwYWlycy4gVGhpcwogICAgICAgICAgICAgICAgbGVzc29uIHNob3dzIGhvdyB0byB1c2UgaGFzaCBtYXBzIGluIFIuIgpkYXRlOiAiPHNtYWxsPmByIFN5cy5EYXRlKClgPC9zbWFsbD4iCmF1dGhvcjogIjxzbWFsbD5NYXJ0aW4gU2NoZWRsYmF1ZXI8L3NtYWxsPiIKZW1haWw6ICJtLnNjaGVkbGJhdWVyQG5ldS5lZHUiCmFmZmlsaXRhdGlvbjogIk5vcnRoZWFzdGVybiBVbml2ZXJzaXR5IgpvdXRwdXQ6IAogIGJvb2tkb3duOjpodG1sX2RvY3VtZW50MjoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBjb2xsYXBzZWQ6IGZhbHNlCiAgICBudW1iZXJfc2VjdGlvbnM6IGZhbHNlCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICB0aGVtZTogc3BhY2VsYWIKICAgIGhpZ2hsaWdodDogdGFuZ28KLS0tCgotLS0KdGl0bGU6ICI8c21hbGw+YHIgcGFyYW1zJGNhdGVnb3J5YC5gciBwYXJhbXMkbnVtYmVyYDwvc21hbGw+PGJyLz48c3BhbiBzdHlsZT0nY29sb3I6ICMyRTQwNTM7IGZvbnQtc2l6ZTogMC45ZW0nPmByIHJtYXJrZG93bjo6bWV0YWRhdGEkdGl0bGVgPC9zcGFuPiIKLS0tCgpgYGB7ciBjb2RlPXhmdW46OnJlYWRfdXRmOChwYXN0ZTAoaGVyZTo6aGVyZSgpLCcvUi9faW5zZXJ0MkRCLlInKSksIGluY2x1ZGUgPSBGQUxTRX0KYGBgCgpgYGB7ciBpbnN0YWxsUGFja2FnZXMsIHdhcm5pbmc9RkFMU0V9Cmxpc3Qub2YucGFja2FnZXMgPC0gYygicjJyIikKbmV3LnBhY2thZ2VzIDwtIGxpc3Qub2YucGFja2FnZXNbIShsaXN0Lm9mLnBhY2thZ2VzICVpbiUgaW5zdGFsbGVkLnBhY2thZ2VzKClbLCJQYWNrYWdlIl0pXQppZihsZW5ndGgobmV3LnBhY2thZ2VzKSkgaW5zdGFsbC5wYWNrYWdlcyhuZXcucGFja2FnZXMsIHJlcG9zPSdodHRwOi8vY3Jhbi51cy5yLXByb2plY3Qub3JnJywgdmVyYm9zZSA9IEYsIHF1aWV0ID0gVCkKYGBgCgojIyBJbnRyb2R1Y3Rpb24KCkEgW2hhc2ggbWFwXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9IYXNoX3RhYmxlKSAoYWxzbyBvZnRlbiBjYWxsZWQgYSBoYXNoIHRhYmxlIG9yIGFzc29jaWF0aXZlIGFycmF5KSwgaXMgYSBjb21tb24ga2V5LXZhbHVlIGRhdGEgc3RydWN0dXJlLiBXaGlsZSBtb3N0IHByb2dyYW1taW5nIGxhbmd1YWdlcyBwcm92aWRlIHNvbWUgaW1wbGVtZW50YXRpb24gb2YgdGhpcyB2aXRhbCBkYXRhIHN0cnVjdHVyZSwgdGhlcmUgaXMgbm8gZXF1aXZhbGVudCBpbiBSIGR1ZSB0byB0aGUgd2F5IFIgbWFuYWdlcyBtZW1vcnkgYW5kIGRvZXNuJ3QgYWxsb3cgZHluYW1pYyBkYXRhIHN0cnVjdHVyZXMgc3VjaCBhcyBsaW5rZWQgbGlzdHMuIFRoZXJlIGFyZSBzb21lIG9wdGlvbnMsIG5ldmVydGhlbGVzcywgd2hpY2ggd2Ugd2lsbCBkZW1vbnN0cmF0ZSBpbiB0aGlzIGxlc3Nvbi4KClRoZSBwcmluY2lwYWwgcmVhc29uIGZvciB1c2luZyBhIGhhc2ggbWFwIGlzIHRoYXQgbG9va2luZyB1cCBhIHZhbHVlIGJhc2VkIG9uIGl0cyBrZXkgaXMgY29uc3RhbnQgdGltZSAoJE8oMSkkKSByZWdhcmRsZXNzIG9mIHRoZSBudW1iZXIgb2YgZWxlbWVudHMgaW4gdGhlIG1hcC4gSW4gY29udHJhc3QsIGxvb2tpbmcgdXAgYSB2YWx1ZSBpbiBhbiBhcnJheSBvciBsaXN0IG9mICRPKG4pJCwgKmkuZS4qLCBpdCBpbmNyZWFzZXMgbGluZWFybHkgd2l0aCB0aGUgbnVtYmVyIG9mIGVsZW1lbnRzIGluIHRoZSBzdHJ1Y3R1cmUuCgojIyBLZXkvVmFsdWUgUGFpcnMKCkEga2V5L3ZhbHVlIHBhaXIgaXMgYSBrZXkgcGx1cyBzb21lIHZhbHVlIHRoYXQgaXMgc3RvcmVkIGZvciB0aGF0IGtleS4gRm9yIGV4YW1wbGUsIGEgdGFibGUgaW4gYSByZWxhdGlvbmFsIGRhdGFiYXNlIGNhbiBiZSBjb25zaWRlcmVkIGEgc2V0IG9mIG9mIGtleS92YWx1ZSBwYWlycyB3aGVyZSB0aGUga2V5IGlzIHRoZSAicHJpbWFyeSBrZXkiIG9mIHRoZSB0YWJsZSBhbmQgdGhlIHZhbHVlIGlzIHRoZSBzZXQgb2Ygbm9uLXByaW1lIGF0dHJpYnV0ZXMuIEFub3RoZXIgZXhhbXBsZSBpcyBhIGpvdXJuYWwgaWRlbnRpZmllZCBieSBpdHMgSVNTTiBhbmQgdGhlIHZhbHVlIHdvdWxkIGJlIHRoZSBqb3VybmFsJ3MgdGl0bGUuCgpGb3IgdGhlIGV4cGVyaW1lbnRzIGluIHRoZSByZW1haW5kZXIgb2YgdGhlIGxlc3NvbiB3ZSB3aWxsIG5lZWQgc29tZSBzYW1wbGUga2V5L3ZhbHVlIHBhaXJzLiBUaGUgY29kZSBiZWxvdyBnZW5lcmF0ZXMgYSBzZXQgb2YgdGhvc2Ugd2l0aCB1bmlxdWUga2V5cyB3aXRoIGNvcnJlc3BvbmRpbmcgdmFsdWVzLgoKYGBge3IgZ2VuS2V5VmFsUGFpcnN9Cm4gPC0gMTAKa2V5cy4xMCA8LSBhcy5jaGFyYWN0ZXIoc2FtcGxlKDEwMDAwMTo5OTk5OTksIG4pKQp2YWxzLjEwIDwtIHBhc3RlMCgidmFsdWUtIiwga2V5cy4xMCkKCm4gPC0gMTAwCmtleXMuMTAwIDwtIGFzLmNoYXJhY3RlcihzYW1wbGUoMTAwMDAxOjk5OTk5OSwgbikpCnZhbHMuMTAwIDwtIHBhc3RlMCgidmFsdWUtIiwga2V5cy4xMDApCgpuIDwtIDEwMDAKa2V5cy4xMDAwIDwtIGFzLmNoYXJhY3RlcihzYW1wbGUoMTAwMDAxOjk5OTk5OSwgbikpCnZhbHMuMTAwMCA8LSBwYXN0ZTAoInZhbHVlLSIsIGtleXMuMTAwMCkKCm4gPC0gMTAwMDAKa2V5cy4xMDAwMCA8LSBhcy5jaGFyYWN0ZXIoc2FtcGxlKDEwMDAwMTo5OTk5OTksIG4pKQp2YWxzLjEwMDAwIDwtIHBhc3RlMCgidmFsdWUtIiwga2V5cy4xMDAwMCkKCm4gPC0gMTAwMDAwCmtleXMuMTAwMDAwIDwtIGFzLmNoYXJhY3RlcihzYW1wbGUoMTAwMDAxOjk5OTk5OSwgbikpCnZhbHMuMTAwMDAwIDwtIHBhc3RlMCgidmFsdWUtIiwga2V5cy4xMDAwMDApCgpuIDwtIDEwMDAwMDAKa2V5cy4xMDAwMDAwIDwtIGFzLmNoYXJhY3RlcihzYW1wbGUoMTAwMDAwMTo5OTk5OTk5LCBuKSkKdmFscy4xMDAwMDAwIDwtIHBhc3RlMCgidmFsdWUtIiwga2V5cy4xMDAwMDAwKQpgYGAKCiMjIExpbmVhciBMb29rdXAKCldoaWxlIFIgZG9lcyBzdXBwb3J0IG5hbWVkIHZlY3RvcnMgYW5kIGxpc3RzIGFuZCBhbGxvd3MgZm9yIGFzc29jaWF0aXZlIGFjY2VzcywgdGhlIHBlcmZvcm1hbmNlIGlzICRPKG4pJCBhbmQgZG9lcyBub3Qgc2NhbGUuIEVzc2VudGlhbGx5LCBhbnkgbG9va3VwIGlzIGEgbGluZWFyIHNlYXJjaCB0aHJvdWdoIHRoZSB2ZWN0b3Igb3IgbGlzdC4KCmBgYHtyfQpsb29rdXAubGluZWFyIDwtIGZ1bmN0aW9uICh4LCBrZXlzLCB2YWxzKQp7CiAgcyA8LSAxOihsZW5ndGgobSkpCiAgCiAgZm9yIChpIGluIHMpIHsKICAgIGlmIChrZXlzW2ldID09IHgpCiAgICAgIHJldHVybiAodmFsc1tpXSkKICB9CiAgCiAgcmV0dXJuIChOQSkKfQpgYGAKCkxldCdzIGFkZCBzb21lIGRhdGEgdG8gdGhpcyBkYXRhIHN0cnVjdHVyZSBzbyB3ZSBjYW4gcGVyZm9ybSBzb21lIHRpbWluZ3Mgb24gbG9va3VwIHBlcmZvcm1hbmNlLiBLZXlzIGhhdmUgdG8gYmUgdW5pcXVlIHNvIHdlIG5lZWQgdG8gZ2VuZXJhdGUgc29tZSBudW1iZXIgb2YgdW5pcXVlIGtleSB2YWx1ZXMuCgpgYGB7cn0KCmBgYAoKIyMgTmFtZWQgVmVjdG9ycwoKYGBge3J9CgpgYGAKCiMjIEhhc2ggTWFwcwoKIyMjIE9wZXJhdGlvbnMgb24gSGFzaCBUYWJsZXMKClRoZXJlIGFyZSB0aHJlZSBtYWluIG9wZXJhdGlvbnMgdGhhdCBtdXN0IGJlIGltcGxlbWVudGVkIGZvciBhIGZ1bmN0aW9uaW5nIGhhc2ggbWFwIGRhdGEgc3RydWN0dXJlLiBHaXZlbiBhIGtleSAoYXMgYSBzdHJpbmcpIGFuZCBhIHZhbHVlIGFzc29jaWF0ZWQgd2l0aCB0aGF0IGtleToKCi0gICBJTlNFUlQgd2lsbCBhZGQgdGhlIGtleS92YWx1ZSBwYWlyIHRvIHRoZSBoYXNoIG1hcAotICAgTE9PS1VQIHdpbGwgc2VhcmNoIHRoZSBoYXNoIG1hcCBmb3IgYSBrZXkgYW5kIHJldHVybiB0aGUgYXNzb2NpYXRlZCB2YWx1ZSBpZiBpdCBleGlzdHMKLSAgIERFTEVURSB3aWxsIHJlbW92ZSBhIHZhbHVlIGJhc2VkIG9uIGEga2V5CgpUaGVyZSBhcmUgb3RoZXIgdXNlZnVsIG9wZXJhdGlvbnMsIHN1Y2ggYXMgRVhJU1RTIHRoYXQgcmV0dXJucyAqdHJ1ZSogaWYgdGhlIGtleSBleGlzdHMgYW5kICpmYWxzZSogb3RoZXJ3aXNlLgoKIyMjIFBhY2thZ2UgPGI+cjJyPC9iPgoKYGBge3IgZXZhbD1GfQpsaWJyYXJ5KHIycikKCmhtIDwtIGhhc2htYXAoKQpgYGAKCiMjIFN1bW1hcnkKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQoKIyMgRmlsZXMgJiBSZXNvdXJjZXMKCmBgYHtyIHppcEZpbGVzLCBlY2hvPUZBTFNFfQp6aXBOYW1lID0gc3ByaW50ZigiTGVzc29uRmlsZXMtJXMtJXMuemlwIiwgCiAgICAgICAgICAgICAgICAgcGFyYW1zJGNhdGVnb3J5LAogICAgICAgICAgICAgICAgIHBhcmFtcyRudW1iZXIpCgp0ZXh0QUxpbmsgPSBwYXN0ZTAoIkFsbCBGaWxlcyBmb3IgTGVzc29uICIsIAogICAgICAgICAgICAgICBwYXJhbXMkY2F0ZWdvcnksIi4iLHBhcmFtcyRudW1iZXIpCgojIGRvd25sb2FkRmlsZXNMaW5rKCkgaXMgaW5jbHVkZWQgZnJvbSBfaW5zZXJ0MkRCLlIKa25pdHI6OnJhd19odG1sKGRvd25sb2FkRmlsZXNMaW5rKCIuIiwgemlwTmFtZSwgdGV4dEFMaW5rKSkKYGBgCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIFJlZmVyZW5jZXMKCltIb3JuZXIsIEouIChNYXJjaCAyNCwgMjAxNSkuIEhhc2ggVGFibGUgUGVyZm9ybWFuY2UgaW4gUjogUGFydCBJLiBSLWJsb2dnZXJzLl0oaHR0cHM6Ly93d3cuci1ibG9nZ2Vycy5jb20vMjAxNS8wMy9oYXNoLXRhYmxlLXBlcmZvcm1hbmNlLWluLXItcGFydC1pLykKCltyMnI6IFItT2JqZWN0IHRvIFItT2JqZWN0IEhhc2ggTWFwc10oaHR0cHM6Ly9jcmFuLnJzdHVkaW8uY29tL3dlYi9wYWNrYWdlcy9yMnIvaW5kZXguaHRtbCkKCiMjIEVycmF0YQoKW0xldCB1cyBrbm93XShodHRwczovL2Zvcm0uam90Zm9ybS5jb20vMjEyMTg3MDcyNzg0MTU3KXt0YXJnZXQ9Il9ibGFuayJ9Lgo=