Learning Objectives

Upon completion of this short lesson, you will be able to:

  • appreciate role of views
  • create views
  • use views in queries
  • describe view materialization
  • use views for schema abstraction

Introduction

Views are an important mechanism in relational databases for queries. In a nutshell, a view is a virtual table that provides a way to present a subset of data from one or more tables. Unlike regular tables, a view does not store data itself but rather stores a SQL query that defines how the data is presented. The result of the query is generated on-the-fly whenever the view is accessed. Views are useful for several reasons, including simplifying complex queries, enhancing security by restricting access to certain columns or rows, and providing a layer of abstraction over the base tables.

While supported by virtually all databases and being part of the SQL-92 standard, there are some minor differences in restrictions around the use of views, so be sure to consult your database vendor’s documentation.

Why Views?

A well-designed schema-independent database only provides queries through an interface in the form of views and does not allow users or client applications to query the tables in the schema directly. Thus, views provide an abstraction of the actual schema that ensures that schema changes do not affect SQL queries. If the name of a table or the definitions of columns changes, then the database architect updates the view definition and client and user queries against the view continue to function without modification.

In addition, views also help enforce security as the data can only be accessed through views and the data in the actual tables is hidden from all users.

Views can also be used to present aggregate data, such as totals and averages, without exposing the underlying data.

Views vs Tables

A view is a virtual table defined by a SQL SELECT statement and behaves in queries exactly like a table. The content of the view is materialized generally only when the view is involved in a query, although databases may cache views to improve performance. Since a view is not stored, it do not consume storage; only the view definition is stored in the database.

Views for Secure Data Access

A view provides an abstraction of the data in one or more tables and can present the data in a different format, in redacted form, or only select rows and columns depending on access privileges. So, rather than exposes an entire table, different views can present different parts of the data in one or more tables to different users depending on their access privileges and needs.

Restrictions on Views

There are several restrictions on how views can be used:

  1. No Storage: Views do not store any data themselves. They are simply a stored query, and the data is fetched from the underlying tables when the view is queried.

  2. Non-Updatable Views: Not all views can be updated. Views that include JOIN operations, aggregate functions, or DISTINCT clauses are typically not updatable. In such cases, you will need to update the base tables directly.

  3. Performance: Since views are just stored queries, every time a view is queried, the underlying query is re-executed. This can have performance implications, especially if the view is based on complex queries or large tables.

  4. Schema Changes: If the structure of the underlying base tables changes (e.g., a column is dropped), the view may become invalid and need to be redefined.

Creating Views

A view is defined using the CREATE VIEW statement in SQL. This statement specifies the query that generates the view’s result set. Views can be based on one or more tables or other views. The structure of a view is identical to that of a table, in the sense that it has columns and rows, but it is dynamic and updates in real-time as the underlying data in the base tables changes.

Here’s the basic syntax to define a view:

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Let’s say that email addresses in the Author table of some marketing databases should not be accessible for non-marketing personnel but should at least be partially available for matching. So, rather than providing access to the data in the Author table, we define a view that the non-marketing personnel and applications use.

The view is defined below as the result of a SQL query. Notice the renaming of columns from the original table.

CREATE VIEW AuthorNM 
 (fullname, email)
 AS
 SELECT name,
        substr(email,1,2) || '****' || substr(email,-5)
   FROM Author;

Note that SQLite uses || for string concatenation while MySQL uses the concat() function.

So, in shortm views are used in much the same way as regular tables. You can run SELECT queries against views, join them with other tables or views, and even use them in INSERT, UPDATE, and DELETE operations under certain conditions.

As another example, suppose we have two tables in a SQLite database, employees and departments:

CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY,
    emp_name TEXT,
    department_id INTEGER,
    salary REAL
);

CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT
);

We can define a view that presents the employees along with their department names:

CREATE VIEW employee_department_view AS
SELECT e.emp_name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Now, querying the view will display the employee details along with their department names:

SELECT * FROM employee_department_view;

Using a View in a Query

A view can be used anywhere a table can be used in a SQL SELECT statement.

SELECT * FROM AuthorNM;

Common Uses Cases for Views

  1. Define a view that contain select rows and columns and provide authorization for some users to access the view but not the underlying tables that define the view.

  2. Aggregate data from a complex query in a view to simplify common data access.

  3. Create views as alternatives to commonly used subqueries to make queries simpler and to provide a degree of reusability.

Updating Views

In SQLite, like Oracle, PostgreSQL, and SQL Server, views can be updatable, which means that you can perform INSERT, UPDATE, or DELETE operations on the view, and these changes will be reflected in the underlying base tables. However, this is not always the case, and there are some restrictions on when a view can be updated.

Views can generally only be used for data retrieval: SELECT but not in INSERT, UPDATE, or DELETE statements. Some views might be updatable if some specific criteria are met and if the database supports the feature.

A view is generally updatable if the following criteria are met:

  • The view must reference a single table.
  • The view must include all non-nullable columns from the base table
  • No aggregations (sum(), avg(), count(), etc.)
  • View does not contain a GROUP BY or HAVING clause
  • View definition does not use subqueries
  • View is not the result of a UNION, ALL, or DISTINCT query
  • The FROM clause in the view definition contains only updatable tables and views and only uses inner joins and no outer joins

Custom update logic on views can be provided using triggers which then update the underlying tables.

Here is an example of an updatable view:

CREATE VIEW employee_salary_view AS
SELECT emp_id, emp_name, salary
FROM employees;

Since this view only references a single table (employees) and does not contain any restrictions or aggregations, it can be updated. For example, we can update an employee’s salary through the view:

UPDATE employee_salary_view
SET salary = 75000
WHERE emp_id = 1;

This operation will directly modify the salary column in the underlying employees table.

However, if we try to update a view that involves a JOIN, SQLite will not allow it, because there’s ambiguity about which table should be updated. For example, trying to update the employee_department_view defined earlier would fail.

Let’s consider a view that is not updatable because it includes a JOIN:

CREATE VIEW employee_salary_department AS
SELECT e.emp_name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Trying to update this view would lead to an error, because it is not clear whether the update should apply to the employees table or the departments table:

UPDATE employee_salary_department
SET salary = 85000
WHERE emp_name = 'John Doe';

This will result in an error similar to:

Error: view employee_salary_department is not updatable

Dropping Views

If a view is no longer needed, it can be removed from the database using the DROP VIEW statement:

DROP VIEW IF EXISTS employee_department_view;

This statement removes the view from the database, but it does not affect the underlying tables or data.

View Materialization

View materialization is an optimization technique used by database management systems (DBMS) to improve the performance of queries that involve complex views. Under normal circumstances, a view is a virtual table, meaning that every time a view is queried, the DBMS must recompute the result by executing the underlying SQL query that defines the view. This is known as virtual view processing. For simple views or views that are rarely accessed, this is efficient. However, for complex views or views queried frequently, recomputing the result each time can be inefficient and slow.

Materialized views address this performance concern by precomputing and storing the results of the view. Instead of recalculating the view’s result every time it is accessed, the result is stored in a physical table (either as a physical table in storage or as a virtual table in memory), allowing much faster access. Essentially, materializing a view transforms it from a virtual view (which is dynamically generated on-demand) into a persistent, real table containing data whose contents is derived from other tables and views.

Materialization of views happens in specific scenarios, typically driven by the need to optimize query performance. There are two primary ways in which materialized views can be created and managed:

Manual Materialization (Materialized Views): In some databases, you explicitly define a materialized view. Unlike regular views, a materialized view is a view whose result is stored in the database as a physical table. The result of the query used to define the view is precomputed and saved, and future accesses to the materialized view retrieve the data from this stored result rather than recomputing it.

Example of creating a materialized view (not supported in SQLite but common in other databases like PostgreSQL and Oracle):

CREATE MATERIALIZED VIEW employee_summary AS
SELECT department_id, COUNT(emp_id) AS employee_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

In this case, the view stores the number of employees and the average salary per department. Each time you query this view, the DBMS does not have to recompute the aggregate functions, but simply returns the precomputed data from storage.

Automatic Materialization (Internal Optimization): Some advanced database engines automatically materialize certain views or subqueries behind the scenes as part of query optimization. The DBMS may decide to materialize the result of a complex query or a frequently accessed view, especially if recomputation is expensive. This is transparent to the user but can provide significant performance benefits.

The decision to materialize a view in these cases is up to the database engine and is typically based on one or more of these considerations:

  • The complexity of the view (e.g., it involves multiple joins or aggregate functions).
  • The frequency with which the view is queried.
  • The cost of recalculating the view’s result each time.
  • Available system resources like memory and storage.

Once a view is materialized, the next concern is how to maintain the correctness of the materialized view, especially when the underlying data changes. There are two common approaches to this:

Lazy (On-demand) Refresh: In this approach, the materialized view is updated only when it is queried. If the underlying tables have been modified since the materialized view was last computed, the DBMS will refresh the materialized view just before returning the query result. This method can delay the retrieval of data but avoids unnecessary refreshes when the view is not accessed frequently.

For example, the employee_summary materialized view may be refreshed only when a query like SELECT * FROM employee_summary is run. If new employees have been added to the employees table since the last materialization, the view will be updated at that point.

Eager (Immediate) Refresh: In this case, the materialized view is updated immediately whenever the underlying base tables are modified. This ensures that the materialized view is always up-to-date but can incur additional overhead every time there’s an insert, update, or delete operation on the base tables.

For example, if an INSERT INTO employees statement adds a new employee, the system automatically updates the corresponding entry in the materialized view employee_summary. This ensures the view reflects the most recent data, but it adds overhead to every write operation.

Some systems support hybrid models where materialized views can be incrementally refreshed. This means instead of recalculating the entire view from scratch when the data changes, the system computes only the changes and applies them to the materialized view, making the update process more efficient.

Materialized Views in Practice

While SQLite, a lightweight relational database, does not natively support materialized views (only virtual views), larger DBMSs like Oracle, PostgreSQL, and Microsoft SQL Server have explicit support for materialized views. In these systems, materialized views are often used for:

  1. Data Warehousing: Materialized views are ideal for large-scale, complex queries over datasets that don’t change frequently. For instance, in a data warehouse, historical data is often queried for reporting purposes, and materialized views can provide pre-aggregated results for fast retrieval.

  2. Performance Optimization: When dealing with complex queries involving multiple joins, aggregations, or subqueries, materialized views allow the DBMS to store the precomputed result, significantly reducing the time required to execute these queries.

  3. Replication and Distributed Databases: Materialized views are sometimes used to replicate data across databases or distribute a consistent snapshot of data across systems. They allow for faster access to read-heavy workloads in distributed environments.

Example of Caching with Materialized Views

In a database that supports materialized views, the following example shows how caching might be used to improve query performance:

CREATE MATERIALIZED VIEW sales_summary AS
SELECT region_id, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY region_id;

This materialized view stores the total sales per region. Normally, computing this aggregate query over a large sales table could be expensive. By materializing the view, the result is cached in the database, allowing fast access when the view is queried.

Now, if you run a query:

SELECT * FROM sales_summary WHERE region_id = 1;

Instead of scanning the entire sales table, the DBMS simply retrieves the precomputed result from the materialized view, which is much faster. The materialized view remains up-to-date according to its refresh policy (e.g., on-demand or automatically).

Advantages and Disadvantages of Materialized Views

Advantages:

  1. Performance: The primary benefit of materialized views is faster query performance. By caching the result of complex queries, materialized views avoid the overhead of recomputing the query every time it is executed.

  2. Efficient Aggregation: For queries that involve expensive operations like joins or aggregations (e.g., GROUP BY), materialized views store the precomputed results, reducing the load on the system when querying these views.

  3. Snapshot of Data: Materialized views provide a consistent snapshot of the data at the time they were last refreshed, which can be useful in data analysis or reporting applications where a “frozen” state of the data is required.

Disadvantages:

  1. Storage Overhead: Since materialized views physically store data, they require additional storage space. This can be significant for large datasets.

  2. Maintenance Overhead: Keeping materialized views up-to-date requires additional processing during INSERT, UPDATE, or DELETE operations on the base tables. In write-heavy environments, this overhead can be substantial, especially for views that need to be refreshed frequently.

  3. Staleness: If a materialized view is not refreshed frequently, its data can become stale, meaning it no longer reflects the current state of the base tables. This is particularly a concern for dynamic datasets with frequent changes.

To summarize, view materialization is an essential optimization technique in which the results of complex or frequently queried views are pre-materialized and cached. By storing the result of the view in a physical table, materialized views provide faster query execution at the cost of additional storage and maintenance overhead. The decision of when to materialize views depends on the complexity of the view, the frequency of query execution, and the balance between query performance and data freshness. While SQLite does not support materialized views, many advanced database systems leverage them to enhance performance, particularly in large-scale applications like data warehousing and reporting.

Summary

Views are an important element in the database architect’s toolkit for building scalable and maintainable databases as they allow abstraction and simplification of interactions with database data. They can significantly improve query readability, provide security by restricting access to certain data, and create a level of abstraction from the underlying table structures. However, views do generally not allow updates, and understanding when and how to use them appropriately is key to leveraging their full potential.

Tutorial

In this short kickstarter lesson, guest speaker Socratica provides an overview of views and how they can be used to abstract tables in a relational databases and provide schema independence for queries.

Files & Resources

All Files for Lesson 70.108

References

None.

Errata

Let us know.

---
title: "Query and Schema Abstraction with Views"
params:
  category: 70
  number: 108
  time: 30
  level: beginner
  tags: "sql,SELECT,AS,LIMIT,ORDER BY,DISTINCT"
  description: "Explains how to define virtual tables using views."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: spacelab
    highlight: tango
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

------------------------------------------------------------------------

## Learning Objectives

Upon completion of this short lesson, you will be able to:

-   appreciate role of views
-   create views
-   use views in queries
-   describe view materialization
-   use views for schema abstraction

------------------------------------------------------------------------

## Introduction

Views are an important mechanism in relational databases for queries. In a nutshell, a **view** is a virtual table that provides a way to present a subset of data from one or more tables. Unlike regular tables, a view does not store data itself but rather stores a SQL query that defines how the data is presented. The result of the query is generated on-the-fly whenever the view is accessed. Views are useful for several reasons, including simplifying complex queries, enhancing security by restricting access to certain columns or rows, and providing a layer of abstraction over the base tables.

While supported by virtually all databases and being part of the SQL-92 standard, there are some minor differences in restrictions around the use of views, so be sure to consult your database vendor's documentation.

## Why Views?

A well-designed schema-independent database only provides queries through an interface in the form of views and does not allow users or client applications to query the tables in the schema directly. Thus, views provide an abstraction of the actual schema that ensures that schema changes do not affect SQL queries. If the name of a table or the definitions of columns changes, then the database architect updates the view definition and client and user queries against the view continue to function without modification.

In addition, views also help enforce security as the data can only be accessed through views and the data in the actual tables is hidden from all users.

Views can also be used to present aggregate data, such as totals and averages, without exposing the underlying data.

## Views vs Tables

A view is a virtual table defined by a SQL *SELECT* statement and behaves in queries exactly like a table. The content of the view is materialized generally only when the view is involved in a query, although databases may cache views to improve performance. Since a view is not stored, it do not consume storage; only the view definition is stored in the database.

## Views for Secure Data Access

A view provides an abstraction of the data in one or more tables and can present the data in a different format, in redacted form, or only select rows and columns depending on access privileges. So, rather than exposes an entire table, different views can present different parts of the data in one or more tables to different users depending on their access privileges and needs.

## Restrictions on Views

There are several restrictions on how views can be used:

1.  **No Storage**: Views do not store any data themselves. They are simply a stored query, and the data is fetched from the underlying tables when the view is queried.

2.  **Non-Updatable Views**: Not all views can be updated. Views that include `JOIN` operations, aggregate functions, or `DISTINCT` clauses are typically not updatable. In such cases, you will need to update the base tables directly.

3.  **Performance**: Since views are just stored queries, every time a view is queried, the underlying query is re-executed. This can have performance implications, especially if the view is based on complex queries or large tables.

4.  **Schema Changes**: If the structure of the underlying base tables changes (e.g., a column is dropped), the view may become invalid and need to be redefined.

## Creating Views

A **view** is defined using the `CREATE VIEW` statement in SQL. This statement specifies the query that generates the view's result set. Views can be based on one or more tables or other views. The structure of a view is identical to that of a table, in the sense that it has columns and rows, but it is dynamic and updates in real-time as the underlying data in the base tables changes.

Here’s the basic syntax to define a view:

``` sql
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
```

Let's say that email addresses in the *Author* table of some marketing databases should not be accessible for non-marketing personnel but should at least be partially available for matching. So, rather than providing access to the data in the Author table, we define a view that the non-marketing personnel and applications use.

The view is defined below as the result of a SQL query. Notice the renaming of columns from the original table.

``` sql
CREATE VIEW AuthorNM 
 (fullname, email)
 AS
 SELECT name,
        substr(email,1,2) || '****' || substr(email,-5)
   FROM Author;
```

Note that SQLite uses \|\| for string concatenation while MySQL uses the `concat()` function.

So, in shortm views are used in much the same way as regular tables. You can run `SELECT` queries against views, join them with other tables or views, and even use them in `INSERT`, `UPDATE`, and `DELETE` operations under certain conditions.

As another example, suppose we have two tables in a SQLite database, `employees` and `departments`:

``` sql
CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY,
    emp_name TEXT,
    department_id INTEGER,
    salary REAL
);

CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT
);
```

We can define a view that presents the employees along with their department names:

``` sql
CREATE VIEW employee_department_view AS
SELECT e.emp_name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
```

Now, querying the view will display the employee details along with their department names:

``` sql
SELECT * FROM employee_department_view;
```

## Using a View in a Query

A view can be used anywhere a table can be used in a SQL SELECT statement.

``` sql
SELECT * FROM AuthorNM;
```

## Common Uses Cases for Views

1.  Define a view that contain select rows and columns and provide authorization for some users to access the view but not the underlying tables that define the view.

2.  Aggregate data from a complex query in a view to simplify common data access.

3.  Create views as alternatives to commonly used subqueries to make queries simpler and to provide a degree of reusability.

## Updating Views

In SQLite, like Oracle, PostgreSQL, and SQL Server, views can be **updatable**, which means that you can perform `INSERT`, `UPDATE`, or `DELETE` operations on the view, and these changes will be reflected in the underlying base tables. However, this is not always the case, and there are some restrictions on when a view can be updated.

Views can generally only be used for data retrieval: SELECT but not in INSERT, UPDATE, or DELETE statements. Some views might be updatable if some specific criteria are met and if the database supports the feature.

A view is generally updatable if the following criteria are met:

-   The view must reference a single table.
-   The view must include all non-nullable columns from the base table
-   No aggregations (sum(), avg(), count(), etc.)
-   View does not contain a GROUP BY or HAVING clause
-   View definition does not use subqueries
-   View is not the result of a UNION, ALL, or DISTINCT query
-   The FROM clause in the view definition contains only updatable tables and views and only uses inner joins and no outer joins

Custom update logic on views can be provided using triggers which then update the underlying tables.

Here is an example of an updatable view:

``` sql
CREATE VIEW employee_salary_view AS
SELECT emp_id, emp_name, salary
FROM employees;
```

Since this view only references a single table (`employees`) and does not contain any restrictions or aggregations, it can be updated. For example, we can update an employee's salary through the view:

``` sql
UPDATE employee_salary_view
SET salary = 75000
WHERE emp_id = 1;
```

This operation will directly modify the `salary` column in the underlying `employees` table.

However, if we try to update a view that involves a `JOIN`, SQLite will not allow it, because there’s ambiguity about which table should be updated. For example, trying to update the `employee_department_view` defined earlier would fail.

Let’s consider a view that is not updatable because it includes a `JOIN`:

``` sql
CREATE VIEW employee_salary_department AS
SELECT e.emp_name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
```

Trying to update this view would lead to an error, because it is not clear whether the update should apply to the `employees` table or the `departments` table:

``` sql
UPDATE employee_salary_department
SET salary = 85000
WHERE emp_name = 'John Doe';
```

This will result in an error similar to:

```         
Error: view employee_salary_department is not updatable
```

## Dropping Views

If a view is no longer needed, it can be removed from the database using the `DROP VIEW` statement:

``` sql
DROP VIEW IF EXISTS employee_department_view;
```

This statement removes the view from the database, but it does not affect the underlying tables or data.

## View Materialization

View materialization is an optimization technique used by database management systems (DBMS) to improve the performance of queries that involve complex views. Under normal circumstances, a view is a virtual table, meaning that every time a view is queried, the DBMS must recompute the result by executing the underlying SQL query that defines the view. This is known as **virtual view** processing. For simple views or views that are rarely accessed, this is efficient. However, for complex views or views queried frequently, recomputing the result each time can be inefficient and slow.

**Materialized views** address this performance concern by **precomputing** and **storing** the results of the view. Instead of recalculating the view's result every time it is accessed, the result is stored in a physical table (either as a physical table in storage or as a virtual table in memory), allowing much faster access. Essentially, materializing a view transforms it from a virtual view (which is dynamically generated on-demand) into a persistent, real table containing data whose contents is derived from other tables and views.

Materialization of views happens in specific scenarios, typically driven by the need to optimize query performance. There are two primary ways in which materialized views can be created and managed:

**Manual Materialization (Materialized Views):** In some databases, you explicitly define a materialized view. Unlike regular views, a **materialized view** is a view whose result is stored in the database as a physical table. The result of the query used to define the view is precomputed and saved, and future accesses to the materialized view retrieve the data from this stored result rather than recomputing it.

Example of creating a materialized view (not supported in SQLite but common in other databases like PostgreSQL and Oracle):

``` sql
CREATE MATERIALIZED VIEW employee_summary AS
SELECT department_id, COUNT(emp_id) AS employee_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
```

In this case, the view stores the number of employees and the average salary per department. Each time you query this view, the DBMS does not have to recompute the aggregate functions, but simply returns the precomputed data from storage.

**Automatic Materialization (Internal Optimization):** Some advanced database engines automatically materialize certain views or subqueries behind the scenes as part of query optimization. The DBMS may decide to materialize the result of a complex query or a frequently accessed view, especially if recomputation is expensive. This is transparent to the user but can provide significant performance benefits.

The decision to materialize a view in these cases is up to the database engine and is typically based on one or more of these considerations:

-   The complexity of the view (*e.g.*, it involves multiple joins or aggregate functions).
-   The frequency with which the view is queried.
-   The cost of recalculating the view’s result each time.
-   Available system resources like memory and storage.

Once a view is materialized, the next concern is how to **maintain** the correctness of the materialized view, especially when the underlying data changes. There are two common approaches to this:

**Lazy (On-demand) Refresh**: In this approach, the materialized view is updated only when it is queried. If the underlying tables have been modified since the materialized view was last computed, the DBMS will refresh the materialized view just before returning the query result. This method can delay the retrieval of data but avoids unnecessary refreshes when the view is not accessed frequently.

For example, the `employee_summary` materialized view may be refreshed only when a query like `SELECT * FROM employee_summary` is run. If new employees have been added to the `employees` table since the last materialization, the view will be updated at that point.

**Eager (Immediate) Refresh**: In this case, the materialized view is updated immediately whenever the underlying base tables are modified. This ensures that the materialized view is always up-to-date but can incur additional overhead every time there’s an insert, update, or delete operation on the base tables.

For example, if an `INSERT INTO employees` statement adds a new employee, the system automatically updates the corresponding entry in the materialized view `employee_summary`. This ensures the view reflects the most recent data, but it adds overhead to every write operation.

Some systems support hybrid models where materialized views can be **incrementally refreshed**. This means instead of recalculating the entire view from scratch when the data changes, the system computes only the changes and applies them to the materialized view, making the update process more efficient.

### Materialized Views in Practice

While SQLite, a lightweight relational database, does not natively support materialized views (only virtual views), larger DBMSs like Oracle, PostgreSQL, and Microsoft SQL Server have explicit support for materialized views. In these systems, materialized views are often used for:

1.  **Data Warehousing**: Materialized views are ideal for large-scale, complex queries over datasets that don’t change frequently. For instance, in a data warehouse, historical data is often queried for reporting purposes, and materialized views can provide pre-aggregated results for fast retrieval.

2.  **Performance Optimization**: When dealing with complex queries involving multiple joins, aggregations, or subqueries, materialized views allow the DBMS to store the precomputed result, significantly reducing the time required to execute these queries.

3.  **Replication and Distributed Databases**: Materialized views are sometimes used to replicate data across databases or distribute a consistent snapshot of data across systems. They allow for faster access to read-heavy workloads in distributed environments.

### Example of Caching with Materialized Views

In a database that supports materialized views, the following example shows how caching might be used to improve query performance:

``` sql
CREATE MATERIALIZED VIEW sales_summary AS
SELECT region_id, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY region_id;
```

This materialized view stores the total sales per region. Normally, computing this aggregate query over a large sales table could be expensive. By materializing the view, the result is cached in the database, allowing fast access when the view is queried.

Now, if you run a query:

``` sql
SELECT * FROM sales_summary WHERE region_id = 1;
```

Instead of scanning the entire `sales` table, the DBMS simply retrieves the precomputed result from the materialized view, which is much faster. The materialized view remains up-to-date according to its refresh policy (e.g., on-demand or automatically).

### Advantages and Disadvantages of Materialized Views

#### Advantages:

1.  **Performance**: The primary benefit of materialized views is faster query performance. By caching the result of complex queries, materialized views avoid the overhead of recomputing the query every time it is executed.

2.  **Efficient Aggregation**: For queries that involve expensive operations like joins or aggregations (e.g., `GROUP BY`), materialized views store the precomputed results, reducing the load on the system when querying these views.

3.  **Snapshot of Data**: Materialized views provide a consistent snapshot of the data at the time they were last refreshed, which can be useful in data analysis or reporting applications where a "frozen" state of the data is required.

#### Disadvantages:

1.  **Storage Overhead**: Since materialized views physically store data, they require additional storage space. This can be significant for large datasets.

2.  **Maintenance Overhead**: Keeping materialized views up-to-date requires additional processing during `INSERT`, `UPDATE`, or `DELETE` operations on the base tables. In write-heavy environments, this overhead can be substantial, especially for views that need to be refreshed frequently.

3.  **Staleness**: If a materialized view is not refreshed frequently, its data can become stale, meaning it no longer reflects the current state of the base tables. This is particularly a concern for dynamic datasets with frequent changes.

To summarize, view materialization is an essential optimization technique in which the results of complex or frequently queried views are pre-materialized and cached. By storing the result of the view in a physical table, materialized views provide faster query execution at the cost of additional storage and maintenance overhead. The decision of when to materialize views depends on the complexity of the view, the frequency of query execution, and the balance between query performance and data freshness. While SQLite does not support materialized views, many advanced database systems leverage them to enhance performance, particularly in large-scale applications like data warehousing and reporting.

## Summary

Views are an important element in the database architect's toolkit for building scalable and maintainable databases as they allow abstraction and simplification of interactions with database data. They can significantly improve query readability, provide security by restricting access to certain data, and create a level of abstraction from the underlying table structures. However, views do generally not allow updates, and understanding when and how to use them appropriately is key to leveraging their full potential.

## Tutorial

In this short kickstarter lesson, guest speaker Socratica provides an overview of views and how they can be used to abstract tables in a relational databases and provide schema independence for queries.

<iframe width="480" height="270" src="https://www.youtube.com/embed/8jU8SrAPn9c?si=P_PAKrwlv6bUVtKu" title="SQL Views" frameborder="1" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen data-external="1">

</iframe>

## See Also

-   [6.301 Working with Databases in R](http://artificium.us/lessons/06.r/l-6-301-sqlite-from-r/l-6-301.html)

------------------------------------------------------------------------

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## References

None.

## Errata

[Let us know](https://form.jotform.com/212187072784157){target="_blank"}.
