Read all instructions before starting.

Motivation

In this assignment you will have an opportunity to practice programming in R and to apply the kNN (k Nearest neighbor) machine learning algorithm to predict a categorical target variable. Additionally, you will have a chance to learn how to use training data to build a model and validation data to determine the fit of the model and its power to generate useful and accurate predictions.

Learning Outcomes

Completing this assignment, will provide you with practice opportunities to

Format

Working is individually is recommended, but working in pairs may be helpful.

Prerequisistes

Prior to working on this assignment, it is suggest that you review these lessons and refer to them during the assignment:

Tools Needed

Setup

Create a new project in R Studio and then, within that project, create a new R Notebook. Set the title parameter of the notebook to “Practice / kNN”; set the author parameter of the notebook to your name; set the date parameter to today’s date.

Follow the instructions below and build an R code chunk for each of the questions below. If you don’t know how to proceed or understand the instructions, then be sure to follow the prerequisite tutorials.

There will be packages required to be installed and loaded. The instructions will have details. Be sure to install the packages prior to loading. On occasion, installing new packages may require additional packages or updates to already installed packages.

Use a level 3 header (using ###) for each part of the exercise, e.g., ### Load Data. Label your code chunks.

Tasks

  1. Download the data set for the tutorial and save it in your project folder.

  2. Follow this tutorial on applying kNN to prostate cancer detection and implement all of the steps in your R Notebook. Use appropriate headers to each step to structure your notebook. Make sure to explain each step and what it does. (Note: The data set provided as part of this assignment has been slightly modified from the one used in the tutorial, so small deviations in the result can be expected.)

  3. Once you’ve complete the tutorial, try another kNN implementation from another package, such as the caret package. Compare the accuracy of the two implementations.

  4. Use the confusionMatrix() function from the caret package to determine the accuracy of both algorithms.


Hints & Resources

Occasionally packages are updated. For example, an update on Jan 31 2020 to dplyr caused an incompatibility with the caret package. In such scenarios you can often wait for an update to the package, not install an update, or downgrade to an earlier version of a package. Here’s how to downgrade a package: webpage. To check the current version of a package use sessionInfo, e.g., sessionInfo("dplyr").


Solution

Not available.