Introduction

In this lesson, you will learn about how to use the SQL SELECT statement to retrieve rows from a single table.

The Role of SQL

SQL is the standard query language for retrieving data from relational databases. It is supported by virtually all relational databases in the same manner. Queries built for one database generally run without modification against another database. Many vendors provide extensions to SQL for analytics and special data types.

SQL is also often used to process tabular data outside of relational databases, e.g., spreadsheets and data frames. For example, Excel supports PowerQuery which allows SQL statements to be run against Excel worksheets; R and Python have packages for executing SQL queries against data frames.

The first query language based on relational algebra, the foundation of relational databases, was Alpha, developed by E. F. Codd (the creator of the relational model of data storage). After that, ISBL was created, and that language in turn was the basis for SQL. SQL is loosely based on relational algebra, though the operands in SQL do not quite map to relational algebra. One issues is that tables are not, in fact, relations and therefore several useful theorems about the relational algebra do not hold in the SQL counterpart. The reason is that the SQL table model is a bag (multiset), rather than a set. Note that SQL is no longer considered an acronym – although some believe it stands for Structured or Standard Query Language, but that is not true.

The tutorial below, in the form of a narrated slide deck (Slide Deck), summarizes the key points in this section. To download the slide deck, right click on the link and choose Save As or Save Link As – or whatever your browser uses to download a link.

Performing Queries

SQL queries require a connection to the database and a means to execute a query. This can be done through a program or via a query tool for ad hoc queries such as MySQL Workbench or the SQLite Console. Most common is embedded SQL for retrieving data into a programming language for analytics and further processing of the data.

Flow of a Query

The client running a program containing embedded SQL or running an ad hoc query tool sends a SQL query (SELECT) to the database management system (typically running on a remote server). The DBMS interprets the query, formulates a query plan, and then accesses the stored data. The result of the query is returned to the client program as a result set in the form of a table. The client program then processes the data further, stores it locally, visualizes the data, or generates a report containing the data.

Flow of SQL Query
Flow of SQL Query

Client/Server Model

Flow of SQL Query
Flow of SQL Query

While most database management systems are client/server databases, some are not. For example, SQLite and Microsoft Access are serverless, file-based databases where all database processing occurs within the application. MySQL, Oracle, SQL Server, Informix, etc are server-based databases and require a network connection and logging in with a user name and password.

Query Process Flow

  1. Client connects to a specific database on the database server by supplying connection parameters (IP address, username, password)
  2. Client stores connection information in a connection object
  3. Client issues a query by a sending SQL statement to the database server over the network connection using the connection object
  4. Server checks to ensure that connected user has permission to access data and that the SQL statement is syntactically correct
  5. Server optimizes query and formulates a query plan based on least cost for time and memory
  6. Server carries out the query by retrieving the data from the storage devices and packaging the result into a temporary table, the result set
  7. Server sends result set back to the client
  8. Client stores the result set in local memory and processes the data
  9. Client disconnects from the database by closing the connection to the server and releasing all resources

Relational Model

The relational data model is among the most used database structures:

  • physical data storage aspects are hidden
  • data records are logically stored in tables
  • each table contains related entries in rows
  • each row represents a data record
  • columns represents attributes
  • rows are unordered
  • columns are unordered
  • SQL is used to access the data records

Sample Database

The remainder of the lesson uses the AssetDB SQLite database to demonstrate the SQL SELECT statement.

AssetDB ER Diagram
AssetDB ER Diagram

The best way to learn a programming language is to follow along in your own program and to experiment with the statements.

To follow along, follow these steps:

  • download the sample database (AssetDB.db)
  • create your own R Notebook, Python Notebook, or Java program in your favorite IDE; or use SQLite Browser or the SQLite command line tools
  • connect to the database

Once you are connected, add code blocks, calls to the database, or ad hoc SQL, for each of the statements and experiment with them. The key is to type the SQL statements, play with them, and experiment; reading is not enough to learn – you need to actually execute the statements.

The quick tutorial below illustrates the different ways to connect to the SQLite database and issue SQL commands.

Tutorial Goes Here!

Tutorial I

The tutorial below, in the form of a narrated slide deck (Slide Deck: SQL Primer - SELECT), summarizes the basic use of SELECT to retrieve data from a single table. You might find if useful to watch before or after reading the lesson.

Tutorial II

In the tutorial below, Khoury Boston’s Prof. Schedlbauer demonstrates the use of the WHERE clause in a SQL SELECT to filter the rows that are retrieved. In addition, the tutorial covers UNIONs.

Covers: SELECT, WHERE, compound queries, LIKE, BETWEEN, IN, ANY, ALL, UNION

Tutorial III

In this tutorial, Khoury Boston’s Prof. Schedlbauer demonstrates how to aggregate data and perform simple computations directly in SQL. Particular attention is paid to how NULL values are handled during aggregation. The lessons finishes with a look at simple grouping with GROUP BY.

Covers: SELECT, WHERE, COUNT, SUM, AVG, MIN, MAX, NULL, IS NULL

SQL Style Recommendations

This section summarizes common style practices for writing and formatting SQL statements and creating SQL scripts1.

General Recommendations

  1. Use consistent and descriptive identifiers and names
  2. Make judicious use of white space and indentation to make code easier to read; keep indentation consistent.
  3. Store ISO 8601 compliant time and date information, i.e., YYYY-MM-DD HH:MM:SS.SSSSS
  4. As much as possible, use standard SQL functions instead of vendor-specific functions for reasons of portability
  5. Keep code succinct and devoid of redundant SQL, such as unnecessary quoting or parentheses or WHERE clauses that can otherwise be derived
  6. Include comments in SQL code where necessary; use C style opening /* and closing */ where possible, otherwise precede comments with – and finish them with a new line
  7. Be consistent with the use of semi-colon to terminate SQL statements, even when not required
  8. Use AS when introducing aliases despite not being required
  9. Disambiguate column names by prefixing with table name or table alias even when not necessary
  10. Avoid CamelCase for column and table names as it is difficult to scan quickly, but if you choose to use, be consistent
  11. Do not use descriptive prefixes or Hungarian notation such as sp_ or tbl.
  12. Avoid using plurals; instead use the more accepted collective term where possible instead, e.g., “staff” instead of “employees” or “people” instead of “individuals”
  13. Do not use periods or spaces in column or table names as it requires awkward escape mechanisms such as the use of backticks, i.e., name a table “foreign_clients” rather than “foreign.clients”
  14. Do not use reserved keywords for table or column names it is confusing and requires backticks to escape the meaning leading to unnecessary complexity
  15. Never give a table the same name as one of its columns and vice versa.
  16. Avoid, where possible, concatenating two table names together to create the name of a junction table, e.g., name the table “services” rather than “cars_mechanics”
  17. Always use the singular name for column.
  18. Where possible avoid simply using id as the primary identifier for the table, use cid or cust_id instead
  19. As much as possible, use lowercase letters for columns and tables
  20. Use upper case letters for all SQL keywords when writing queries
  21. Use the plural for tables names, e.g., “files” instead of “file”
  22. Stored procedure names must start with a verb and indicate an outcome or action, e.g., “InsertNewFlight”
  23. CamelCase is acceptable for stored procedure names
  24. The keywords in a SQL query should right align
  25. Use parenthesis rather than relying on implied order of precedence of operations

Examples

SELECT f.uid, f.pwd  -- user ID and password
  FROM files AS f
 WHERE f.name = '.bashrc';
/* Updating the file record after writing to the file */
UPDATE file_system
   SET file_modified_date = '2023-01-01 00:00:00.00000',
       file_size = 209732
 WHERE file_name = '.vimrc';

Prefered Query Formalisms

  1. Use of BETWEEN where possible instead of combining multiple statements with AND
  2. Use set membership operators such as IN instead of multiple OR clause.
  3. Where a value needs to be interpreted before leaving the database use the CASE expression; CASE statements can be nested to form more complex logical structures
  4. Avoid the use of UNION clauses and temporary tables where possible. If the schema can be optimized to remove the reliance on these features then it most likely should be

Data Types and Defaults

  1. Whenever possible, do not use vendor-specific data types as these are not portable and may not be available in older versions of the same vendor’s software
  2. Only use REAL or FLOAT types where it is strictly necessary for floating point mathematics; otherwise prefer NUMERIC and DECIMAL at all times due to floating point rounding errors
  3. Specify default values whenever a reasonable value is available
  4. The default value must be the same type as the column, e.g., if a column is declared as DECIMAL do not provide an INTEGER default value
  5. Default values must follow the data type declaration and come before any NOT NULL statement

Conclusion

Files & Resources

All Files for Lesson 70.102

References

None.

Errata

Let us know.

---
title: "Data Retrieval with SQL SELECT"
params:
  category: 70
  number: 102
  time: 30
  level: beginner
  tags: "sql,SELECT,AS,LIMIT,ORDER BY,DISTINCT"
  description: "Explains how to retrieve data using the SQL SELECT statement.
                Covers selection, aggregation, aliasing, limits, ordering, 
                distinct rows, built-in functions, and literals. Provides
                explanation of order of execution."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: spacelab
    highlight: tango
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

## Introduction

In this lesson, you will learn about how to use the SQL SELECT statement to retrieve rows from a single table.

## The Role of SQL

SQL is the standard query language for retrieving data from relational databases. It is supported by virtually all relational databases in the same manner. Queries built for one database generally run without modification against another database. Many vendors provide extensions to SQL for analytics and special data types.

SQL is also often used to process tabular data outside of relational databases, *e.g.*, spreadsheets and data frames. For example, Excel supports PowerQuery which allows SQL statements to be run against Excel worksheets; R and Python have packages for executing SQL queries against data frames.

The first query language based on relational algebra, the foundation of relational databases, was Alpha, developed by E. F. Codd (the creator of the relational model of data storage). After that, ISBL was created, and that language in turn was the basis for SQL. SQL is loosely based on relational algebra, though the operands in SQL do not quite map to relational algebra. One issues is that tables are not, in fact, relations and therefore several useful theorems about the relational algebra do not hold in the SQL counterpart. The reason is that the SQL table model is a bag (multiset), rather than a set. Note that SQL is no longer considered an acronym -- although some believe it stands for Structured or Standard Query Language, but that is not true.

The tutorial below, in the form of a narrated slide deck ([Slide Deck](s-70-102-sql-select.pptx)), summarizes the key points in this section. To download the slide deck, right click on the link and choose *Save As* or *Save Link As* -- or whatever your browser uses to download a link.

```{=html}
<iframe src="https://northeastern.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=7f7725ef-b7cb-4c73-a54a-ac52014a1946&amp;autoplay=false&amp;offerviewer=true&amp;showtitle=false&amp;showbrand=false&amp;start=0&amp;interactivity=all" width="480" height="270" allowfullscreen="allowfullscreen" allow="autoplay" data-external="1"></iframe>
```
## Performing Queries

SQL queries require a connection to the database and a means to execute a query. This can be done through a program or via a query tool for *ad hoc* queries such as MySQL Workbench or the SQLite Console. Most common is embedded SQL for retrieving data into a programming language for analytics and further processing of the data.

## Flow of a Query

The client running a program containing embedded SQL or running an *ad hoc* query tool sends a SQL query (SELECT) to the database management system (typically running on a remote server). The DBMS interprets the query, formulates a query plan, and then accesses the stored data. The result of the query is returned to the client program as a result set in the form of a table. The client program then processes the data further, stores it locally, visualizes the data, or generates a report containing the data.

![Flow of SQL Query](images/flow-of-sql-query-client-server.jpg){width="50%"}

## Client/Server Model

![Flow of SQL Query](images/client-server-model-fat-vs-thin-client.jpg){width="50%"}

While most database management systems are client/server databases, some are not. For example, SQLite and Microsoft Access are serverless, file-based databases where all database processing occurs within the application. MySQL, Oracle, SQL Server, Informix, *etc* are server-based databases and require a network connection and logging in with a user name and password.

### Query Process Flow

1.  Client connects to a specific database on the database server by supplying connection parameters (IP address, username, password)
2.  Client stores connection information in a connection object
3.  Client issues a query by a sending SQL statement to the database server over the network connection using the connection object
4.  Server checks to ensure that connected user has permission to access data and that the SQL statement is syntactically correct
5.  Server optimizes query and formulates a query plan based on least cost for time and memory
6.  Server carries out the query by retrieving the data from the storage devices and packaging the result into a temporary table, the result set
7.  Server sends result set back to the client
8.  Client stores the result set in local memory and processes the data
9.  Client disconnects from the database by closing the connection to the server and releasing all resources

## Relational Model

The relational data model is among the most used database structures:

-   physical data storage aspects are hidden
-   data records are logically stored in tables
-   each table contains related entries in rows
-   each row represents a data record
-   columns represents attributes
-   rows are unordered
-   columns are unordered
-   SQL is used to access the data records

## Sample Database

The remainder of the lesson uses the *AssetDB* SQLite database to demonstrate the SQL SELECT statement.

![AssetDB ER Diagram](images/assetdb-erd.jpg){width="70%"}

The best way to learn a programming language is to follow along in your own program and to experiment with the statements.

To follow along, follow these steps:

-   download the sample database ([AssetDB.db](AssetDB.db))
-   create your own R Notebook, Python Notebook, or Java program in your favorite IDE; or use [SQLite Browser](https://sqlitebrowser.org/dl/) or the SQLite command line tools
-   connect to the database

Once you are connected, add code blocks, calls to the database, or *ad hoc* SQL, for each of the statements and experiment with them. The key is to type the SQL statements, play with them, and experiment; reading is not enough to learn -- you need to actually execute the statements.

The quick tutorial below illustrates the different ways to connect to the SQLite database and issue SQL commands.

> Tutorial Goes Here!

## Tutorial I

The tutorial below, in the form of a narrated slide deck ([Slide Deck: SQL Primer - SELECT](s-70-102-sql-select.pptx)), summarizes the basic use of SELECT to retrieve data from a single table. You might find if useful to watch before or after reading the lesson.

```{=html}
<iframe src="https://northeastern.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=b43f7938-f0d5-4e87-bd86-ac5400292bfd&amp;autoplay=false&amp;offerviewer=true&amp;showtitle=false&amp;showbrand=false&amp;start=0&amp;interactivity=all" width="480" height="270" allowfullscreen="allowfullscreen" allow="autoplay" data-external="1"></iframe>
```
## Tutorial II

In the tutorial below, Khoury Boston's Prof. Schedlbauer demonstrates the use of the WHERE clause in a SQL SELECT to filter the rows that are retrieved. In addition, the tutorial covers UNIONs.

<iframe style="border: 1px solid #464646;" src="https://northeastern.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=e064745c-71a7-43d0-8c09-accd017ad22d&amp;autoplay=false&amp;offerviewer=true&amp;showtitle=false&amp;showbrand=false&amp;start=0&amp;interactivity=all" width="480" height="270" allowfullscreen="allowfullscreen" allow="autoplay" data-external="1">

</iframe>

**Covers**: SELECT, WHERE, compound queries, LIKE, BETWEEN, IN, ANY, ALL, UNION

## Tutorial III

In this tutorial, Khoury Boston's Prof. Schedlbauer demonstrates how to aggregate data and perform simple computations directly in SQL. Particular attention is paid to how NULL values are handled during aggregation. The lessons finishes with a look at simple grouping with GROUP BY.

<iframe style="border: 1px solid #464646;" src="https://northeastern.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=9dd5c7a0-64aa-4a66-b884-ac9900ec6d6e&amp;autoplay=false&amp;offerviewer=true&amp;showtitle=false&amp;showbrand=false&amp;start=0&amp;interactivity=all" width="480" height="270" allowfullscreen="allowfullscreen" allow="autoplay" data-external="1">

</iframe>

**Covers**: SELECT, WHERE, COUNT, SUM, AVG, MIN, MAX, NULL, IS NULL

## SQL Style Recommendations

This section summarizes common style practices for writing and formatting SQL statements and creating SQL scripts[^1].

[^1]: Portions derived from SQL Style Guide <https://www.sqlstyle.guide/>. by [Simon Holywell](<https://www.simonholywell.com/?utm_source=sqlstyle.guide&utm_medium=link&utm_campaign=md-document>. which is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).

### General Recommendations

1.  Use consistent and descriptive identifiers and names
2.  Make judicious use of white space and indentation to make code easier to read; keep indentation consistent.
3.  Store [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) compliant time and date information, *i.e.*, YYYY-MM-DD HH:MM:SS.SSSSS
4.  As much as possible, use standard SQL functions instead of vendor-specific functions for reasons of portability
5.  Keep code succinct and devoid of redundant SQL, such as unnecessary quoting or parentheses or WHERE clauses that can otherwise be derived
6.  Include comments in SQL code where necessary; use C style opening /\* and closing \*/ where possible, otherwise precede comments with -- and finish them with a new line
7.  Be consistent with the use of semi-colon to terminate SQL statements, even when not required
8.  Use *AS* when introducing aliases despite not being required
9.  Disambiguate column names by prefixing with table name or table alias even when not necessary
10. Avoid CamelCase for column and table names as it is difficult to scan quickly, but if you choose to use, be consistent
11. Do not use descriptive prefixes or Hungarian notation such as *sp\_* or *tbl*.
12. Avoid using plurals; instead use the more accepted collective term where possible instead, *e.g.*, "staff" instead of "employees" or "people" instead of "individuals"
13. Do not use periods or spaces in column or table names as it requires awkward escape mechanisms such as the use of backticks, *i.e.*, name a table "foreign_clients" rather than "foreign.clients"
14. Do not use reserved keywords for table or column names it is confusing and requires backticks to escape the meaning leading to unnecessary complexity
15. Never give a table the same name as one of its columns and *vice versa*.
16. Avoid, where possible, concatenating two table names together to create the name of a junction table, *e.g.*, name the table "services" rather than "cars_mechanics"
17. Always use the singular name for column.
18. Where possible avoid simply using *id* as the primary identifier for the table, use *cid* or *cust_id* instead
19. As much as possible, use lowercase letters for columns and tables
20. Use upper case letters for all SQL keywords when writing queries
21. Use the plural for tables names, *e.g.*, "files" instead of "file"
22. Stored procedure names must start with a verb and indicate an outcome or action, *e.g.*, "InsertNewFlight"
23. CamelCase is acceptable for stored procedure names
24. The keywords in a SQL query should right align
25. Use parenthesis rather than relying on implied order of precedence of operations

### Examples

``` sql
SELECT f.uid, f.pwd  -- user ID and password
  FROM files AS f
 WHERE f.name = '.bashrc';
```

``` sql
/* Updating the file record after writing to the file */
UPDATE file_system
   SET file_modified_date = '2023-01-01 00:00:00.00000',
       file_size = 209732
 WHERE file_name = '.vimrc';
```

### Prefered Query Formalisms

1.  Use of `BETWEEN` where possible instead of combining multiple statements with `AND`
2.  Use set membership operators such as `IN` instead of multiple `OR` clause.
3.  Where a value needs to be interpreted before leaving the database use the `CASE` expression; `CASE` statements can be nested to form more complex logical structures
4.  Avoid the use of `UNION` clauses and temporary tables where possible. If the schema can be optimized to remove the reliance on these features then it most likely should be

### Data Types and Defaults

1.  Whenever possible, do not use vendor-specific data types as these are not portable and may not be available in older versions of the same vendor's software
2.  Only use *REAL* or *FLOAT* types where it is strictly necessary for floating point mathematics; otherwise prefer *NUMERIC* and *DECIMAL* at all times due to floating point rounding errors
3.  Specify default values whenever a reasonable value is available
4.  The default value must be the same type as the column, *e.g.*, if a column is declared as *DECIMAL* do not provide an *INTEGER* default value
5.  Default values must follow the data type declaration and come before any `NOT NULL` statement

## Conclusion

## See Also

-   [70.801 The SQLite Database](http://artificium.us/lessons/70.sql/l-70-801-intro-sqlite/l-70-801.html)
-   [6.301 Working with Databases in R](http://artificium.us/lessons/06.r/l-6-301-sqlite-from-r/l-6-301.html)
-   [9.602 Working with Databases in Java](http://artificium.us/lessons/09.java/l-9-602-sqlite-from-java/l-9-602.html)

------------------------------------------------------------------------

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## References

None.

## Errata

[Let us know](https://form.jotform.com/212187072784157){target="_blank"}.
