Upon completion of this short lesson, you will be able to:
appreciate role of views
create views
use views in queries
describe view materialization
use views for schema abstraction
Introduction
Views are an important mechanism in relational databases for queries. In a nutshell, a view is a virtual table that provides a way to present a subset of data from one or more tables. Unlike regular tables, a view does not store data itself but rather stores a SQL query that defines how the data is presented. The result of the query is generated on-the-fly whenever the view is accessed. Views are useful for several reasons, including simplifying complex queries, enhancing security by restricting access to certain columns or rows, and providing a layer of abstraction over the base tables.
While supported by virtually all databases and being part of the SQL-92 standard, there are some minor differences in restrictions around the use of views, so be sure to consult your database vendor’s documentation.
Why Views?
A well-designed schema-independent database only provides queries through an interface in the form of views and does not allow users or client applications to query the tables in the schema directly. Thus, views provide an abstraction of the actual schema that ensures that schema changes do not affect SQL queries. If the name of a table or the definitions of columns changes, then the database architect updates the view definition and client and user queries against the view continue to function without modification.
In addition, views also help enforce security as the data can only be accessed through views and the data in the actual tables is hidden from all users.
Views can also be used to present aggregate data, such as totals and averages, without exposing the underlying data.
Views vs Tables
A view is a virtual table defined by a SQL SELECT statement and behaves in queries exactly like a table. The content of the view is materialized generally only when the view is involved in a query, although databases may cache views to improve performance. Since a view is not stored, it do not consume storage; only the view definition is stored in the database.
Views for Secure Data Access
A view provides an abstraction of the data in one or more tables and can present the data in a different format, in redacted form, or only select rows and columns depending on access privileges. So, rather than exposes an entire table, different views can present different parts of the data in one or more tables to different users depending on their access privileges and needs.
Restrictions on Views
There are several restrictions on how views can be used:
No Storage: Views do not store any data themselves. They are simply a stored query, and the data is fetched from the underlying tables when the view is queried.
Non-Updatable Views: Not all views can be updated. Views that include JOIN operations, aggregate functions, or DISTINCT clauses are typically not updatable. In such cases, you will need to update the base tables directly.
Performance: Since views are just stored queries, every time a view is queried, the underlying query is re-executed. This can have performance implications, especially if the view is based on complex queries or large tables.
Schema Changes: If the structure of the underlying base tables changes (e.g., a column is dropped), the view may become invalid and need to be redefined.
Creating Views
A view is defined using the CREATE VIEW statement in SQL. This statement specifies the query that generates the view’s result set. Views can be based on one or more tables or other views. The structure of a view is identical to that of a table, in the sense that it has columns and rows, but it is dynamic and updates in real-time as the underlying data in the base tables changes.
Let’s say that email addresses in the Author table of some marketing databases should not be accessible for non-marketing personnel but should at least be partially available for matching. So, rather than providing access to the data in the Author table, we define a view that the non-marketing personnel and applications use.
The view is defined below as the result of a SQL query. Notice the renaming of columns from the original table.
Note that SQLite uses || for string concatenation while MySQL uses the concat() function.
So, in shortm views are used in much the same way as regular tables. You can run SELECT queries against views, join them with other tables or views, and even use them in INSERT, UPDATE, and DELETE operations under certain conditions.
As another example, suppose we have two tables in a SQLite database, employees and departments:
We can define a view that presents the employees along with their department names:
CREATEVIEW employee_department_view ASSELECT e.emp_name, d.department_name, e.salaryFROM employees eJOIN departments d ON e.department_id = d.department_id;
Now, querying the view will display the employee details along with their department names:
SELECT*FROM employee_department_view;
Using a View in a Query
A view can be used anywhere a table can be used in a SQL SELECT statement.
SELECT*FROM AuthorNM;
Common Uses Cases for Views
Define a view that contain select rows and columns and provide authorization for some users to access the view but not the underlying tables that define the view.
Aggregate data from a complex query in a view to simplify common data access.
Create views as alternatives to commonly used subqueries to make queries simpler and to provide a degree of reusability.
Updating Views
In SQLite, like Oracle, PostgreSQL, and SQL Server, views can be updatable, which means that you can perform INSERT, UPDATE, or DELETE operations on the view, and these changes will be reflected in the underlying base tables. However, this is not always the case, and there are some restrictions on when a view can be updated.
Views can generally only be used for data retrieval: SELECT but not in INSERT, UPDATE, or DELETE statements. Some views might be updatable if some specific criteria are met and if the database supports the feature.
A view is generally updatable if the following criteria are met:
The view must reference a single table.
The view must include all non-nullable columns from the base table
No aggregations (sum(), avg(), count(), etc.)
View does not contain a GROUP BY or HAVING clause
View definition does not use subqueries
View is not the result of a UNION, ALL, or DISTINCT query
The FROM clause in the view definition contains only updatable tables and views and only uses inner joins and no outer joins
Custom update logic on views can be provided using triggers which then update the underlying tables.
Since this view only references a single table (employees) and does not contain any restrictions or aggregations, it can be updated. For example, we can update an employee’s salary through the view:
This operation will directly modify the salary column in the underlying employees table.
However, if we try to update a view that involves a JOIN, SQLite will not allow it, because there’s ambiguity about which table should be updated. For example, trying to update the employee_department_view defined earlier would fail.
Let’s consider a view that is not updatable because it includes a JOIN:
CREATEVIEW employee_salary_department ASSELECT e.emp_name, d.department_name, e.salaryFROM employees eJOIN departments d ON e.department_id = d.department_id;
Trying to update this view would lead to an error, because it is not clear whether the update should apply to the employees table or the departments table:
Error: view employee_salary_department is not updatable
Dropping Views
If a view is no longer needed, it can be removed from the database using the DROP VIEW statement:
DROPVIEWIFEXISTS employee_department_view;
This statement removes the view from the database, but it does not affect the underlying tables or data.
View Materialization
View materialization is an optimization technique used by database management systems (DBMS) to improve the performance of queries that involve complex views. Under normal circumstances, a view is a virtual table, meaning that every time a view is queried, the DBMS must recompute the result by executing the underlying SQL query that defines the view. This is known as virtual view processing. For simple views or views that are rarely accessed, this is efficient. However, for complex views or views queried frequently, recomputing the result each time can be inefficient and slow.
Materialized views address this performance concern by precomputing and storing the results of the view. Instead of recalculating the view’s result every time it is accessed, the result is stored in a physical table (either as a physical table in storage or as a virtual table in memory), allowing much faster access. Essentially, materializing a view transforms it from a virtual view (which is dynamically generated on-demand) into a persistent, real table containing data whose contents is derived from other tables and views.
Materialization of views happens in specific scenarios, typically driven by the need to optimize query performance. There are two primary ways in which materialized views can be created and managed:
Manual Materialization (Materialized Views): In some databases, you explicitly define a materialized view. Unlike regular views, a materialized view is a view whose result is stored in the database as a physical table. The result of the query used to define the view is precomputed and saved, and future accesses to the materialized view retrieve the data from this stored result rather than recomputing it.
Example of creating a materialized view (not supported in SQLite but common in other databases like PostgreSQL and Oracle):
CREATEMATERIALIZEDVIEW employee_summary ASSELECT department_id, COUNT(emp_id) AS employee_count, AVG(salary) AS avg_salaryFROM employeesGROUPBY department_id;
In this case, the view stores the number of employees and the average salary per department. Each time you query this view, the DBMS does not have to recompute the aggregate functions, but simply returns the precomputed data from storage.
Automatic Materialization (Internal Optimization): Some advanced database engines automatically materialize certain views or subqueries behind the scenes as part of query optimization. The DBMS may decide to materialize the result of a complex query or a frequently accessed view, especially if recomputation is expensive. This is transparent to the user but can provide significant performance benefits.
The decision to materialize a view in these cases is up to the database engine and is typically based on one or more of these considerations:
The complexity of the view (e.g., it involves multiple joins or aggregate functions).
The frequency with which the view is queried.
The cost of recalculating the view’s result each time.
Available system resources like memory and storage.
Once a view is materialized, the next concern is how to maintain the correctness of the materialized view, especially when the underlying data changes. There are two common approaches to this:
Lazy (On-demand) Refresh: In this approach, the materialized view is updated only when it is queried. If the underlying tables have been modified since the materialized view was last computed, the DBMS will refresh the materialized view just before returning the query result. This method can delay the retrieval of data but avoids unnecessary refreshes when the view is not accessed frequently.
For example, the employee_summary materialized view may be refreshed only when a query like SELECT * FROM employee_summary is run. If new employees have been added to the employees table since the last materialization, the view will be updated at that point.
Eager (Immediate) Refresh: In this case, the materialized view is updated immediately whenever the underlying base tables are modified. This ensures that the materialized view is always up-to-date but can incur additional overhead every time there’s an insert, update, or delete operation on the base tables.
For example, if an INSERT INTO employees statement adds a new employee, the system automatically updates the corresponding entry in the materialized view employee_summary. This ensures the view reflects the most recent data, but it adds overhead to every write operation.
Some systems support hybrid models where materialized views can be incrementally refreshed. This means instead of recalculating the entire view from scratch when the data changes, the system computes only the changes and applies them to the materialized view, making the update process more efficient.
Materialized Views in Practice
While SQLite, a lightweight relational database, does not natively support materialized views (only virtual views), larger DBMSs like Oracle, PostgreSQL, and Microsoft SQL Server have explicit support for materialized views. In these systems, materialized views are often used for:
Data Warehousing: Materialized views are ideal for large-scale, complex queries over datasets that don’t change frequently. For instance, in a data warehouse, historical data is often queried for reporting purposes, and materialized views can provide pre-aggregated results for fast retrieval.
Performance Optimization: When dealing with complex queries involving multiple joins, aggregations, or subqueries, materialized views allow the DBMS to store the precomputed result, significantly reducing the time required to execute these queries.
Replication and Distributed Databases: Materialized views are sometimes used to replicate data across databases or distribute a consistent snapshot of data across systems. They allow for faster access to read-heavy workloads in distributed environments.
Example of Caching with Materialized Views
In a database that supports materialized views, the following example shows how caching might be used to improve query performance:
CREATEMATERIALIZEDVIEW sales_summary ASSELECT region_id, SUM(sales_amount) AS total_salesFROM salesGROUPBY region_id;
This materialized view stores the total sales per region. Normally, computing this aggregate query over a large sales table could be expensive. By materializing the view, the result is cached in the database, allowing fast access when the view is queried.
Now, if you run a query:
SELECT*FROM sales_summary WHERE region_id =1;
Instead of scanning the entire sales table, the DBMS simply retrieves the precomputed result from the materialized view, which is much faster. The materialized view remains up-to-date according to its refresh policy (e.g., on-demand or automatically).
Advantages and Disadvantages of Materialized Views
Advantages:
Performance: The primary benefit of materialized views is faster query performance. By caching the result of complex queries, materialized views avoid the overhead of recomputing the query every time it is executed.
Efficient Aggregation: For queries that involve expensive operations like joins or aggregations (e.g., GROUP BY), materialized views store the precomputed results, reducing the load on the system when querying these views.
Snapshot of Data: Materialized views provide a consistent snapshot of the data at the time they were last refreshed, which can be useful in data analysis or reporting applications where a “frozen” state of the data is required.
Disadvantages:
Storage Overhead: Since materialized views physically store data, they require additional storage space. This can be significant for large datasets.
Maintenance Overhead: Keeping materialized views up-to-date requires additional processing during INSERT, UPDATE, or DELETE operations on the base tables. In write-heavy environments, this overhead can be substantial, especially for views that need to be refreshed frequently.
Staleness: If a materialized view is not refreshed frequently, its data can become stale, meaning it no longer reflects the current state of the base tables. This is particularly a concern for dynamic datasets with frequent changes.
To summarize, view materialization is an essential optimization technique in which the results of complex or frequently queried views are pre-materialized and cached. By storing the result of the view in a physical table, materialized views provide faster query execution at the cost of additional storage and maintenance overhead. The decision of when to materialize views depends on the complexity of the view, the frequency of query execution, and the balance between query performance and data freshness. While SQLite does not support materialized views, many advanced database systems leverage them to enhance performance, particularly in large-scale applications like data warehousing and reporting.
Summary
Views are an important element in the database architect’s toolkit for building scalable and maintainable databases as they allow abstraction and simplification of interactions with database data. They can significantly improve query readability, provide security by restricting access to certain data, and create a level of abstraction from the underlying table structures. However, views do generally not allow updates, and understanding when and how to use them appropriately is key to leveraging their full potential.
Tutorial
In this short kickstarter lesson, guest speaker Socratica provides an overview of views and how they can be used to abstract tables in a relational databases and provide schema independence for queries.