Format: individual (without the assistance of a third party or AI)

Estimated Time to Complete: 90 min

Read the complete assignment before you start.

Motivation

In this assignment, learners will have the opportunity to formulate queries in SQL.

Learning Outcomes

Completing this assignment, will allow you to

Prerequisistes

Prior to working on this assignment, review these lessons and refer to them during the assignment:

Materials & Tools Needed

Requirements

You must write the queries in SQL unless otherwise specified.


Database Schema

Instructions

Download the database Media Database1 . Move the downloaded database file into your R project folder. Create an R Notebook (within an R Project) named LastNameF.QueryDB.Rmd. Set the meta field title to “Query a Database with SQL”, the author to your “lastname, first name”, and the date to the current term or today’s date.

In an R code chunk, connect to the database in the usual manner using dbConnect().

Questions

Read through all of the questions first and then work through them in any sequence. Use the above ERD to understand the structure of the database. Use SQL or database commands to get additional details on the schema.

You may use {sql} or {r} code chunks to solve the queries.

Question 1

List the last name, first name, title, and hire data of all employees who support at least one customer and are not in “Calgary”? Do not list duplicates.

Question 2

List the last name, first name of each employee hired after January 1, 2004 ordered ascending by last name. Hint: explore the use of the functions strftime and cast

Question 3

List the number of tracks by genre, ordered from most to least, restricted to those genres for which there are at least ten tracks.

Question 4

For each employee, list their last name, first name, title, year of hire, and the last name of who they report to ordered by last name of manager. Rename the columns for the manager’s first and last name to “LastNameMgr” and “FirstNameMgr”.

Question 5

List the number of employees employed in each city. Rename the column for the number of employees to “NumEmpl”.

Question 6

How many employees do not support a customer, i.e., they appear in the employees table but not in the customers table.

Question 7

How many albums contain the substring “Rock” but not “Rio” in the album title?

Question 8

What is the total “time” of all tracks per genre in hours, i.e., add up the “Milliseconds” column in tracks? Display the genre’s name and the time in hours rounded to one decimal. Only consider genres for which there is at least one track.

Question 9

Which playlists are the longest (greater than 0), regardless of type and format? List the playlist name(s) and the time in two columns: one for hours and one for minutes.

Question 10

For the last question, use an {r} code chunk and use R functions to query the database. Do not use a {sql} code chunk.

Using a SQL query (from R), retrieve the “GenreId”, “Name” (or a genre), and “UnitPrice” (of a track) for all tracks and store the result in a dataframe. Using the data in the dataframe and R code, calculate the mean (µ) and standard deviation (σ) of “UnitPrice” and then calculate the 95% confidence interval for the mean, defined for this assignment as: µ ± 1.96σ.

Display the result in the following format (but not that color and font):

The 95% CI for the mean unit price ranges from 88.3 to 109.2.

Of course, the numbers above are only to show the format; they are not the actual values.


Submission

Submit the R Notebook (.Rmd). The notebook must not contain any absolute path names.


Hints & Resources

None yet.



  1. To download the SQLite database file from the link, right click on the link and choose Save As… or Save Link As…, depending on your browser. Save the file in your R Project folder.↩︎