Read all instructions before starting.

Motivation

In this assignment you will have an opportunity to practice programming in R.

Learning Outcomes

Completing this assignment, will provide you with practice opportunities to

Format

Working is individually is recommended, but working in pairs may be helpful.

Prerequisistes

Prior to working on this assignment, it is suggest that you review these lessons and refer to them during the assignment:

Tools Needed

Setup

Create a new project in R Studio and then, within that project, create a new R Notebook. Set the title parameter of the notebook to “Practice / Forecasting”; set the author parameter of the notebook to your name; set the date parameter to today’s date.

Follow the instructions below and build an R code chunk for each of the questions below. If you don’t know how to proceed or understand the instructions, then be sure to follow the prerequisite tutorials.

You should not use any additional packages (such as purrr or tidyverse); you should learn to do the tasks using only ‘Base R’.

Use a level 2 header (using ##) for each new question and use the question number as your title, e.g., ## Question 3.

Label each code chunk with the question number and the objective, e.g.,

```{r Q1_LoadCSV}
   ... your code goes here ...
```

Task Set I

  1. The built-in dataset USArrests contains statistics about violent crime rates in the US States. Determine which states are outliers in terms of murders. Outliers, for the sake of this question, are defined as values that are more than 1.5 standard deviations from the mean.
  2. For the same dataset, is there a correlation between urban population and murder, i.e., as one goes up, does the other statistic as well? Comment on the strength of the correlation. Which correlation algorithm is appropriate? Pearson? Spearman, Kendall? How would you decide between them? What if you choose an incorrect algorithm; what would the effect be?

Task Set II

  1. Based on the data on the growth of mobile phone use in Brazil (you’ll need to copy the data and create a CSV that you can load into R or use the gsheet2tbl() function from the gsheet package), forecast phone use for the next time period using a 2-year weighted moving average (with weights of 5 for the most recent year, and 2 for other), exponential smoothing (alpha of 0.4), and linear regression trendline.

  2. Calculate the squared error for each model, i.e., use the model to calculate a forecast for each provided time period in the data set and then the square the error.

  3. Calculate the average (mean) squared error for each model.

  4. Which model has the smallest mean squared error (MSE)?

  5. Write a function called ensembleForecast() that calculates a weighted average forecast by averaging out the three forecasts calculated with the following weights: 4 for trend line, 2 for exponential smoothing, 1 for weighted moving average. Remember to divide by the sum of the weights in a weighted average.


Hints & Resources

None yet.


Solution

A-3.102-Solution.Rmd