Introduction
A common source of confusion in SQL arises from the distinction between the syntactic (coding) order in which a query is written and the logical execution order in which the database management system (DBMS) processes that query. Although SQL queries are written in a declarative form beginning with SELECT, the logical processing model evaluates clauses in a different order.
This distinction is essential for:
- Correct query formulation
- Avoidance of semantic errors
- Proper use of aggregation and filtering
- Reasoning about query optimization and relational algebra equivalences
This lesson presents the logical query processing model with examples in R and SQLite, although the discussion is agnostic of the specific relational database used.
Declarative Nature of SQL
SQL is a declarative language grounded in relational algebra. The programmer specifies what result is desired rather than how it is to be computed. Internally, the database query engine translates the SQL statement into an algebraic expression and generates an execution plan. For more information on query planning and execution, see 60.551 – Query Planning and Processing in Relational Databases
Importantly, the textual order of clauses does not correspond to the logical evaluation order.
Syntax Order vs. Logical Processing Order
SQL queries are written in the following canonical form:
SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...
While a programmer might think that the syntactic order of the clauses is the execution order of the clauses by the database query engine, that is not the case. The query engine evaluates the query in the following logical order:
- FROM – Identify source relations and compute joins
- WHERE – Apply tuple-level predicates
- GROUP BY – Partition tuples into groups
- HAVING – Apply group-level predicates
- SELECT – Compute projection and expressions plus window functions
- DISTINCT – Eliminate duplicates (if specified)
- ORDER BY – Sort the result
- LIMIT / OFFSET – Restrict the result set
This order reflects the relational algebra transformation pipeline rather than physical execution details. Not understanding the execution (or evaluation) order is a common source of mistakes. The graphic below summarizes the syntax vs logical order of SQL clauses:

Example in SQLite using R
For expediency, the example below uses a (temporary) in-memory database rather than an on-disk database.
library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), ":memory:")
Nest, we will create a table and insert sample data into the table.
s <- dbExecute(conn, "
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
department TEXT,
salary INTEGER
);
")
s <- dbExecute(conn, "
INSERT INTO employees (department, salary) VALUES
('Engineering', 90000),
('Engineering', 110000),
('HR', 60000),
('HR', 65000),
('Marketing', 70000);
")
Example 1: Tuple-Level Filtering
The SQL statement below filters tuples (rows) based on a logical expression. Of course, the expression could be more complex using a combination of AND, OR, NOT, and IS NULL, alongside <, >, and =.
SELECT department, salary AS wage
FROM employees
WHERE salary > 80000;
Table 1: 2 records
| Engineering |
90000 |
| Engineering |
110000 |
The query engine evaluates the query in the following order:
FROM employees Construct the base relation.
WHERE salary > 80000 Apply selection predicate ( _{salary > 80000}(R) ).
SELECT department, salary Project specified attributes.
N.B.: The WHERE clause cannot reference the alias wage defined in SELECT because projection occurs after selection in the logical processing order.
Example 2: Aggregation and Group-Level Filtering
Let’s evaluate the query below which contains two filtering clauses: a WHERE clause and a HAVING clause. The WHERE clause is executed first before any grouping is done. Once the groupings has been done, the HAVING clause is executed on the groups to filter the partitions. The ORDER BY clause is executed last, after the columns specified in the SELECT have been projected out.
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 60000
GROUP BY department
HAVING AVG(salary) > 80000
ORDER BY avg_salary DESC;
Table 2: 1 records
| Engineering |
1e+05 |
To summarize, the query is evaluated in the following sequence:
FROM employees Construct base relation.
WHERE salary > 60000 Filter tuples.
GROUP BY department Partition tuples by department.
Aggregate Computation Compute AVG(salary) per group.
HAVING AVG(salary) > 80000 Filter groups.
SELECT department, AVG(salary) Project attributes and computed aggregates.
ORDER BY avg_salary DESC Sort final result.
Alias Scope
This query fails because any alias defined in the SELECT clause is not available in the WHERE since the WHERE clause is evaluated before the SELECT. However, an alias for a table is available as the FROM is executed first:
SELECT salary * 1.1 AS increased_salary
FROM employees AS e
WHERE e.increased_salary > 80000;
So, because WHERE is logically evaluated before SELECT, the alias increased_salary does not yet exist.
The correct formulation should be:
SELECT salary * 1.1 AS increased_salary
FROM employees
WHERE salary * 1.1 > 80000;
WHERE vs. HAVING
A common confusion is misunderstanding WHERE versus HAVING: WHERE filters tuples prior to grouping whereas HAVING filters groups after grouping.
Thus, the clause:
is valid, while:
is not.
Logical vs. Physical Execution
It is important to distinguish between:
- Logical processing order (semantic model)
- Physical execution plan (optimizer-dependent)
SQLite, like other DBMSs, may reorder operations for efficiency (predicate pushdown, index scans, join reordering). However, these transformations preserve logical equivalence under relational algebra rules. See 60.551 – Query Planning and Processing in Relational Databases for more information on query planning and execution.
Summary
Understanding logical query processing is key to virtually every aspect of working with relational databases, especially for writing semantically correct analytical queries, and being able to correct alternative, and perhaps more efficient, query formulations.
The apparent contradiction between SQL’s textual structure and its logical processing order is not accidental; it reflects SQL’s declarative foundation in relational algebra.
Acknowledgement
Claude 4.6 (Sonnet) was used to assist in the preparation of this lesson.
---
title: "Execution vs Syntax Order of SQL Statements"
params:
  category: 70
  number: 118
  time: 30
  level: beginner
  tags: "sql,order,group by,having,where"
  description: "Examines the distinction between the syntactic structure of SQL 
                queries and their logical query processing order, grounding 
                the discussion in relational algebra and database theory. 
                Provides practical SQLite examples in R."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: journal
    highlight: tango
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r echo=F}
# Package names
packages <- c("here", "RSQLite")

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages], repos = "https://cloud.r-project.org")
}

# Packages loading
suppressPackageStartupMessages(invisible(lapply(packages, library, character.only = TRUE)))
```

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

## Introduction

A common source of confusion in SQL arises from the distinction between the **syntactic (coding) order** in which a query is written and the **logical execution order** in which the database management system (DBMS) processes that query. Although SQL queries are written in a declarative form beginning with *SELECT*, the logical processing model evaluates clauses in a different order.

This distinction is essential for:

-   Correct query formulation
-   Avoidance of semantic errors
-   Proper use of aggregation and filtering
-   Reasoning about query optimization and relational algebra equivalences

This lesson presents the logical query processing model with examples in R and SQLite, although the discussion is agnostic of the specific relational database used.

## Declarative Nature of SQL

SQL is a **declarative language** grounded in relational algebra. The programmer specifies *what* result is desired rather than *how* it is to be computed. Internally, the database query engine translates the SQL statement into an algebraic expression and generates an execution plan. For more information on query planning and execution, see [60.551 -- Query Planning and Processing in Relational Databases](http://artificium.us/lessons/60.dbdesign/l-60-551-query-processing/l-60-551.html)

Importantly, the textual order of clauses **does not** correspond to the logical evaluation order.

## Syntax Order vs. Logical Processing Order

SQL queries are written in the following canonical form:

``` sql
SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...
```

While a programmer might think that the syntactic order of the clauses is the execution order of the clauses by the database query engine, that is not the case. The query engine evaluates the query in the following logical order:

1.  **FROM** – Identify source relations and compute joins
2.  **WHERE** – Apply tuple-level predicates
3.  **GROUP BY** – Partition tuples into groups
4.  **HAVING** – Apply group-level predicates
5.  **SELECT** – Compute projection and expressions plus window functions
6.  **DISTINCT** – Eliminate duplicates (if specified)
7.  **ORDER BY** – Sort the result
8.  **LIMIT / OFFSET** – Restrict the result set

This order reflects the relational algebra transformation pipeline rather than physical execution details. Not understanding the execution (or evaluation) order is a common source of mistakes. The graphic below summarizes the syntax vs logical order of SQL clauses:

![](images/sql-execution-vs-logical-order.png){width="70%"}

## Formal Interpretation in Relational Terms

Given a base relation ( *R* ):

-   *FROM* corresponds to relation construction (including joins).
-   *WHERE* corresponds to **selection (σ)**.
-   *GROUP BY* corresponds to partitioning prior to aggregation.
-   Aggregation operators (e.g., `COUNT`, `SUM`) apply over each partition/group.
-   *HAVING* corresponds to selection over grouped results.
-   *SELECT* corresponds to **projection (π)** and computed attributes.
-   *ORDER BY* introduces a total ordering (non-relational operation).

Thus, SQL’s logical processing model aligns closely with classical relational algebra. Note that any aggregation function is applied to each partition if a PARTITION BY or GROUP BY clause is present, otherwise, in the absence of such a clause, it is applied to the entire table. It is not possible to combine table-level and partition-level aggregation in the same query.

A query may have both a GROUP BY and a PARTITION BY clause; in such a scenario, the GROUP BY is executed before the PARTITION BY.

## Example in SQLite using R

For expediency, the example below uses a (temporary) in-memory database rather than an on-disk database.

```{r}
library(RSQLite)

conn <- dbConnect(RSQLite::SQLite(), ":memory:")
```

Nest, we will create a table and insert sample data into the table.

```{r}
s <- dbExecute(conn, "
CREATE TABLE employees (
  id INTEGER PRIMARY KEY,
  department TEXT,
  salary INTEGER
);
")

s <- dbExecute(conn, "
INSERT INTO employees (department, salary) VALUES
('Engineering', 90000),
('Engineering', 110000),
('HR', 60000),
('HR', 65000),
('Marketing', 70000);
")
```

## Example 1: Tuple-Level Filtering

The SQL statement below filters tuples (rows) based on a logical expression. Of course, the expression could be more complex using a combination of *AND*, *OR*, *NOT*, and *IS NULL*, alongside \<, \>, and =.

```{sql connection=conn}
SELECT department, salary AS wage
FROM employees
WHERE salary > 80000;
```

The query engine evaluates the query in the following order:

1.  **FROM employees** Construct the base relation.

2.  **WHERE salary \> 80000** Apply selection predicate ( \sigma\_{salary \> 80000}(R) ).

3.  **SELECT department, salary** Project specified attributes.

**N.B.**: The *WHERE* clause cannot reference the alias *wage* defined in *SELECT* because projection occurs after selection in the logical processing order.

------------------------------------------------------------------------

## Example 2: Aggregation and Group-Level Filtering

Let's evaluate the query below which contains two filtering clauses: a *WHERE* clause and a *HAVING* clause. The *WHERE* clause is executed first before any grouping is done. Once the groupings has been done, the *HAVING* clause is executed on the groups to filter the partitions. The *ORDER BY* clause is executed last, after the columns specified in the *SELECT* have been projected out.

```{sql connection=conn}
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 60000
GROUP BY department
HAVING AVG(salary) > 80000
ORDER BY avg_salary DESC;
```

To summarize, the query is evaluated in the following sequence:

1.  **FROM employees** Construct base relation.

2.  **WHERE salary \> 60000** Filter tuples.

3.  **GROUP BY department** Partition tuples by department.

4.  **Aggregate Computation** Compute `AVG(salary)` per group.

5.  **HAVING AVG(salary) \> 80000** Filter groups.

6.  **SELECT department, AVG(salary)** Project attributes and computed aggregates.

7.  **ORDER BY avg_salary DESC** Sort final result.

## Alias Scope

This query fails because any alias defined in the *SELECT* clause is not available in the *WHERE* since the *WHERE* clause is evaluated before the *SELECT*. However, an alias for a table is available as the *FROM* is executed first:

``` sql
SELECT salary * 1.1 AS increased_salary
FROM employees AS e
WHERE e.increased_salary > 80000;
```

So, because *WHERE* is logically evaluated before *SELECT*, the alias *increased_salary* does not yet exist.

The correct formulation should be:

``` sql
SELECT salary * 1.1 AS increased_salary
FROM employees
WHERE salary * 1.1 > 80000;
```

## WHERE vs. HAVING

A common confusion is misunderstanding *WHERE* versus *HAVING*: *WHERE* filters **tuples prior to grouping** whereas *HAVING* filters **groups after grouping**.

Thus, the clause:

``` sql
HAVING COUNT(*) > 1
```

is valid, while:

``` sql
WHERE COUNT(*) > 1
```

is not.

## Logical vs. Physical Execution

It is important to distinguish between:

-   **Logical processing order** (semantic model)
-   **Physical execution plan** (optimizer-dependent)

SQLite, like other DBMSs, may reorder operations for efficiency (predicate pushdown, index scans, join reordering). However, these transformations preserve logical equivalence under relational algebra rules. See [60.551 -- Query Planning and Processing in Relational Databases](http://artificium.us/lessons/60.dbdesign/l-60-551-query-processing/l-60-551.html) for more information on query planning and execution.

## Summary

Understanding logical query processing is key to virtually every aspect of working with relational databases, especially for writing semantically correct analytical queries, and being able to correct alternative, and perhaps more efficient, query formulations.

The apparent contradiction between SQL’s textual structure and its logical processing order is not accidental; it reflects SQL’s declarative foundation in relational algebra.

------------------------------------------------------------------------

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## Acknowledgement

Claude 4.6 (Sonnet) was used to assist in the preparation of this lesson.

## Errata

[Let us know](https://form.jotform.com/212187072784157){target="_blank"}.
