Assignment 6.104-v1 Working with Data Frames in R

Read all instructions before starting.

Motivation

In this assignment you will have an opportunity to practice programming in R.

Learning Outcomes

Completing this assignment, will give you an opportunity to

create an R Notebook with a project to organize data processing work
load a CSV file containing data organized in rows and columns into a data frame
query a data frame
calculate descriptive statistics

Format

Working is individually is recommended, but working in pairs may be helpful.

Prerequisistes

Prior to working on this assignment, it is suggest that you review these lessons and refer to them during the assignment:

Tools & Files Needed

R Studio or posit.cloud
CSV File: whitewines.csv

Setup

Create a new project in R Studio and then, within that project, create a new R Notebook. Set the title parameter of the notebook to “Practice / Working with Data Frames”; set the author parameter of the notebook to your name; set the date parameter to today’s date.

Follow the instructions below and build an R code chunk for each of the questions below. If you don’t know how to proceed or understand the instructions, then be sure to follow the prerequisite tutorials.

You should not use any additional packages (such as purrr or tidyverse); you should learn to do the tasks using only ‘Base R’.

Use a level 2 header (using ##) for each new question and use the question number as your title, e.g., ## Question 3.

Label each code chunk with the question number and the objective, e.g.,

```{r Q1_LoadCSV}
   ... your code goes here ...
```

Tasks

Load the CSV file from the URL provided above into a data frame called df.wines. Do not load the text (strings) as factors. Check to see if the CSV contains column headers and load appropriately.
Inspect the data frame and determine its structure.
How many wines have a residual sugar level above 0.5 and an alcohol level above 7?
How many wines have an alcohol content between 9.5 and 11.5 (inclusive) and a quality rating below 7?
Are there any wines with an alcohol content above 14?
What are the median and mean alcohol content of all wines?
Add a new column to the data frame df.wines called swill_index that is calculated by dividing the alcohol content by the the quality and multiplying it by the residual sugar content.
Add a new column to the data frame df.wines called alcohol.z that calculates the z-score for alcohol. The z-score for a feature is calculated as the number of standard deviations that the value is from the mean. So, calculate the mean and the standard deviation for alcohol and then set the value of the alcohol.z column to the value to: mean(alcohol) - alcohol] / sd(alcohol).

Hints & Resources

None yet.

Solution

Solution A-6.104-SOL.Rmd

Assignment 6.104-v1Working with Data Frames in R

2024-05-09 / Northeastern University