Introduction
Factors variables are a type of variable in R that can take on one of a set of enumerated values and are used to implement categorical variables. Factor variables have “levels” which is the set of values that represent the variables domain. For example, blood type is a categorical variables as it has a defined set of enumerated values: \({A, B, AB, O}\).
Factor variables are categorical variables that can be either numeric or character (string/text). Of course, categorical variables can be simply encoded as numeric or character, but there are a few benefits to representing categorical variables as factor variables. One of the most important benefits is that they can be used in inferential statistical modeling or classification algorithms for machine learning where they properly represent degrees of freedom.
This tutorial focuses on the definition of factor variables but will not be concerned about using them for statistical modeling or machine learning.
Defining Factor Variables
Factor variables are defined using the factor()
functions which requires a vector of defined values. The class or type of a factor variables is, as expected, factor.
# define the levels
blood.types <- c("A","B","AB","O")
# create a new variable as a factor
blood.factor <- factor(blood.types)
# check to ensure it is a factor
class (blood.factor)
## [1] "factor"
## [1] TRUE
The different values that a factor variable can take on are called its levels and can be found using the levels()
function.
## [1] "A" "AB" "B" "O"
Internally, R encodes the levels as integers as this will speed up performance and reduce memory requirements.
## Factor w/ 4 levels "A","AB","B","O": 1 3 2 4
Factors vs Character When Reading a CSV File
Factor Fields in Reference Classes
Conclusion
Tutorial
The video tutorial demonstrates the constructs introduced in this lesson.
Errata
None collected yet. Let us know.
LS0tCnRpdGxlOiAiRmFjdG9yczogQ2F0ZWdvcmljYWwgVmFyaWFibGVzIGluIFIiCnBhcmFtczoKICBjYXRlZ29yeTogNgogIG51bWJlcjogMTA1CiAgdGltZTogNDUKICBsZXZlbDogYmVnaW5uZXIKICB0YWdzOiAicixwcmltZXIsdmVjdG9ycyxkYXRhIGZyYW1lcyIKICBkZXNjcmlwdGlvbjogIkRlbW9uc3RyYXRlcyBob3cgdG8gZGVmaW5lIGFuZCB1c2UgZmFjdG9yIHZhcmlhYmxlcyB3aGljaCBhcmUgdXNlZCB0bwogICAgICAgICAgICAgICAgaW1wbGVtZW50IGNhdGVnb3JpY2FsIChlbnVtZXJhdGVkKSB2YXJpYWJsZXMuIgpkYXRlOiAiPHNtYWxsPmByIFN5cy5EYXRlKClgPC9zbWFsbD4iCmF1dGhvcjogIjxzbWFsbD5NYXJ0aW4gU2NoZWRsYmF1ZXI8L3NtYWxsPiIKZW1haWw6ICJtLnNjaGVkbGJhdWVyQG5ldS5lZHUiCmFmZmlsaXRhdGlvbjogIk5vcnRoZWFzdGVybiBVbml2ZXJzaXR5IgpvdXRwdXQ6IAogIGJvb2tkb3duOjpodG1sX2RvY3VtZW50MjoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBjb2xsYXBzZWQ6IGZhbHNlCiAgICBudW1iZXJfc2VjdGlvbnM6IGZhbHNlCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICB0aGVtZTogc3BhY2VsYWIKICAgIGhpZ2hsaWdodDogdGFuZ28KLS0tCgotLS0KdGl0bGU6ICI8c21hbGw+YHIgcGFyYW1zJGNhdGVnb3J5YC5gciBwYXJhbXMkbnVtYmVyYDwvc21hbGw+PGJyLz48c3BhbiBzdHlsZT0nY29sb3I6ICMyRTQwNTM7IGZvbnQtc2l6ZTogMC45ZW0nPmByIHJtYXJrZG93bjo6bWV0YWRhdGEkdGl0bGVgPC9zcGFuPiIKLS0tCgpgYGB7ciBjb2RlPXhmdW46OnJlYWRfdXRmOChwYXN0ZTAoaGVyZTo6aGVyZSgpLCcvUi9faW5zZXJ0MkRCLlInKSksIGluY2x1ZGUgPSBGQUxTRX0KYGBgCgojIyBJbnRyb2R1Y3Rpb24KCkZhY3RvcnMgdmFyaWFibGVzIGFyZSBhIHR5cGUgb2YgdmFyaWFibGUgaW4gUiB0aGF0IGNhbiB0YWtlIG9uIG9uZSBvZiBhIHNldCBvZiBlbnVtZXJhdGVkIHZhbHVlcyBhbmQgYXJlIHVzZWQgdG8gaW1wbGVtZW50IGNhdGVnb3JpY2FsIHZhcmlhYmxlcy4gRmFjdG9yIHZhcmlhYmxlcyBoYXZlICJsZXZlbHMiIHdoaWNoIGlzIHRoZSBzZXQgb2YgdmFsdWVzIHRoYXQgcmVwcmVzZW50IHRoZSB2YXJpYWJsZXMgZG9tYWluLiBGb3IgZXhhbXBsZSwgYmxvb2QgdHlwZSBpcyBhIGNhdGVnb3JpY2FsIHZhcmlhYmxlcyBhcyBpdCBoYXMgYSBkZWZpbmVkIHNldCBvZiBlbnVtZXJhdGVkIHZhbHVlczogJHtBLCBCLCBBQiwgT30kLgoKRmFjdG9yIHZhcmlhYmxlcyBhcmUgY2F0ZWdvcmljYWwgdmFyaWFibGVzIHRoYXQgY2FuIGJlIGVpdGhlciBudW1lcmljIG9yIGNoYXJhY3RlciAoc3RyaW5nL3RleHQpLiBPZiBjb3Vyc2UsIGNhdGVnb3JpY2FsIHZhcmlhYmxlcyBjYW4gYmUgc2ltcGx5IGVuY29kZWQgYXMgbnVtZXJpYyBvciBjaGFyYWN0ZXIsIGJ1dCB0aGVyZSBhcmUgYSBmZXcgYmVuZWZpdHMgdG8gcmVwcmVzZW50aW5nIGNhdGVnb3JpY2FsIHZhcmlhYmxlcyBhcyBmYWN0b3IgdmFyaWFibGVzLiBPbmUgb2YgdGhlIG1vc3QgaW1wb3J0YW50IGJlbmVmaXRzIGlzIHRoYXQgdGhleSBjYW4gYmUgdXNlZCBpbiBpbmZlcmVudGlhbCBzdGF0aXN0aWNhbCBtb2RlbGluZyBvciBjbGFzc2lmaWNhdGlvbiBhbGdvcml0aG1zIGZvciBtYWNoaW5lIGxlYXJuaW5nIHdoZXJlIHRoZXkgcHJvcGVybHkgcmVwcmVzZW50IGRlZ3JlZXMgb2YgZnJlZWRvbS4KClRoaXMgdHV0b3JpYWwgZm9jdXNlcyBvbiB0aGUgZGVmaW5pdGlvbiBvZiBmYWN0b3IgdmFyaWFibGVzIGJ1dCB3aWxsIG5vdCBiZSBjb25jZXJuZWQgYWJvdXQgdXNpbmcgdGhlbSBmb3Igc3RhdGlzdGljYWwgbW9kZWxpbmcgb3IgbWFjaGluZSBsZWFybmluZy4KCiMjIERlZmluaW5nIEZhY3RvciBWYXJpYWJsZXMKCkZhY3RvciB2YXJpYWJsZXMgYXJlIGRlZmluZWQgdXNpbmcgdGhlIDxjb2RlPmZhY3RvcigpPC9jb2RlPiBmdW5jdGlvbnMgd2hpY2ggcmVxdWlyZXMgYSB2ZWN0b3Igb2YgZGVmaW5lZCB2YWx1ZXMuIFRoZSAqY2xhc3MqIG9yICp0eXBlKiBvZiBhIGZhY3RvciB2YXJpYWJsZXMgaXMsIGFzIGV4cGVjdGVkLCAqZmFjdG9yKi4KCmBgYHtyfQojIGRlZmluZSB0aGUgbGV2ZWxzCmJsb29kLnR5cGVzIDwtIGMoIkEiLCJCIiwiQUIiLCJPIikKCiMgY3JlYXRlIGEgbmV3IHZhcmlhYmxlIGFzIGEgZmFjdG9yCmJsb29kLmZhY3RvciA8LSBmYWN0b3IoYmxvb2QudHlwZXMpCgojIGNoZWNrIHRvIGVuc3VyZSBpdCBpcyBhIGZhY3RvcgpjbGFzcyAoYmxvb2QuZmFjdG9yKQoKaXMuZmFjdG9yKGJsb29kLmZhY3RvcikKYGBgCgpUaGUgZGlmZmVyZW50IHZhbHVlcyB0aGF0IGEgZmFjdG9yIHZhcmlhYmxlIGNhbiB0YWtlIG9uIGFyZSBjYWxsZWQgaXRzICpsZXZlbHMqIGFuZCBjYW4gYmUgZm91bmQgdXNpbmcgdGhlIDxjb2RlPmxldmVscygpPC9jb2RlPiBmdW5jdGlvbi4KCmBgYHtyfQpsZXZlbHMoYmxvb2QuZmFjdG9yKQpgYGAKCkludGVybmFsbHksIFIgZW5jb2RlcyB0aGUgbGV2ZWxzIGFzIGludGVnZXJzIGFzIHRoaXMgd2lsbCBzcGVlZCB1cCBwZXJmb3JtYW5jZSBhbmQgcmVkdWNlIG1lbW9yeSByZXF1aXJlbWVudHMuCgpgYGB7cn0Kc3RyKGJsb29kLmZhY3RvcikKYGBgCgojIyBGYWN0b3JzIHZzIENoYXJhY3RlciBXaGVuIFJlYWRpbmcgYSBDU1YgRmlsZQoKIyMgRmFjdG9yIEZpZWxkcyBpbiBSZWZlcmVuY2UgQ2xhc3NlcwoKIyMgQ29uY2x1c2lvbgoKIyMgVHV0b3JpYWwKClRoZSB2aWRlbyB0dXRvcmlhbCBkZW1vbnN0cmF0ZXMgdGhlIGNvbnN0cnVjdHMgaW50cm9kdWNlZCBpbiB0aGlzIGxlc3Nvbi4KCmBgYHs9aHRtbH0KPGlmcmFtZSBzcmM9IiIgd2lkdGg9IjQ4MCIgaGVpZ2h0PSIyNzAiIGZyYW1lYm9yZGVyPSIwIiBhbGxvdz0iYXV0b3BsYXk7IGZ1bGxzY3JlZW47IHBpY3R1cmUtaW4tcGljdHVyZSIgYWxsb3dmdWxsc2NyZWVuIGRhdGEtZXh0ZXJuYWw9IjEiPjwvaWZyYW1lPgpgYGAKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQoKIyMgRmlsZXMgJiBSZXNvdXJjZXMKCmBgYHtyIHppcEZpbGVzLCBlY2hvPUZBTFNFfQp6aXBOYW1lID0gc3ByaW50ZigiTGVzc29uRmlsZXMtJXMtJXMuemlwIiwgCiAgICAgICAgICAgICAgICAgcGFyYW1zJGNhdGVnb3J5LAogICAgICAgICAgICAgICAgIHBhcmFtcyRudW1iZXIpCgp0ZXh0QUxpbmsgPSBwYXN0ZTAoIkFsbCBGaWxlcyBmb3IgTGVzc29uICIsIAogICAgICAgICAgICAgICBwYXJhbXMkY2F0ZWdvcnksIi4iLHBhcmFtcyRudW1iZXIpCgojIGRvd25sb2FkRmlsZXNMaW5rKCkgaXMgaW5jbHVkZWQgZnJvbSBfaW5zZXJ0MkRCLlIKa25pdHI6OnJhd19odG1sKGRvd25sb2FkRmlsZXNMaW5rKCIuIiwgemlwTmFtZSwgdGV4dEFMaW5rKSkKYGBgCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIFJlZmVyZW5jZXMKCltTbWl0aCwgTy4gKDIwMjApLiBGYWN0b3JzIGluIFIgVHV0b3JpYWwuIERhdGFjYW1wLiBKdW5lIDgsIDIwMjAuXV0oPGh0dHBzOi8vd3d3LmRhdGFjYW1wLmNvbS9jb21tdW5pdHkvdHV0b3JpYWxzL2ZhY3RvcnMtaW4tcj4pIFtGYWN0b3IgVmFyaWFibGVzIGluIFIuIFVDTEEgQWR2YW5jZWQgUmVzZWFyY2ggQ29tcHV0aW5nOiBTdGF0aXN0aWNhbCBNZXRob2RzIGFuZCBEYXRhIEFuYWx5c2lzLl0oaHR0cHM6Ly9zdGF0cy5vYXJjLnVjbGEuZWR1L3IvbW9kdWxlcy9mYWN0b3ItdmFyaWFibGVzLykpCgojIyBFcnJhdGEKCk5vbmUgY29sbGVjdGVkIHlldC4gTGV0IHVzIGtub3cuCgpgYGB7PWh0bWx9CjxzY3JpcHQgc3JjPSJodHRwczovL2Zvcm0uam90Zm9ybS5jb20vc3RhdGljL2ZlZWRiYWNrMi5qcyIgdHlwZT0idGV4dC9qYXZhc2NyaXB0Ij4KICBuZXcgSm90Zm9ybUZlZWRiYWNrKHsKICAgIGZvcm1JZDogIjIxMjE4NzA3Mjc4NDE1NyIsCiAgICBidXR0b25UZXh0OiAiRmVlZGJhY2siLAogICAgYmFzZTogImh0dHBzOi8vZm9ybS5qb3Rmb3JtLmNvbS8iLAogICAgYmFja2dyb3VuZDogIiNGNTkyMDIiLAogICAgZm9udENvbG9yOiAiI0ZGRkZGRiIsCiAgICBidXR0b25TaWRlOiAibGVmdCIsCiAgICBidXR0b25BbGlnbjogImNlbnRlciIsCiAgICB0eXBlOiBmYWxzZSwKICAgIHdpZHRoOiA3MDAsCiAgICBoZWlnaHQ6IDUwMCwKICAgIGlzQ2FyZEZvcm06IGZhbHNlCiAgfSk7Cjwvc2NyaXB0PgpgYGAKYGBge3IgY29kZT14ZnVuOjpyZWFkX3V0ZjgocGFzdGUwKGhlcmU6OmhlcmUoKSwnL1IvX2RlcGxveUtuaXQuUicpKSwgaW5jbHVkZSA9IEZBTFNFfQpgYGAK