60.611 Essential Guide to Indexing in SQLite

Objectives

Upon completion of this lesson, you will be able to:

create an index in SQLite
delete a created index
understand performance considerations for indexes

Introduction

In this tutorial, you will learn how to use indexes in SQLite to speed up data retrievals (SELECT), improve sorting (ORDER BY), and enforce uniqueness constraints (UNIQUE). An index is an additional data structure stored in the databases that is associated with a single table with the goal to improve the performance of a query. A table may have multiple indexes. Most databases automatically create an index for the primary key, although SQLite does not, while MySQL does.

There are several data structures that may be used for indexing: hashmaps and trees. SQLite uses self-balancing B-trees for the structure of its indexes. A B-tree is a self-balancing tree data structure that maintains its data nodes in sorted order and allows searches in \(O(log_n)\) (logarithmic) time. The B-tree has generally more than 2 children per node. B-trees are well suited for storage systems that read and write relatively large blocks of data and is therefore commonly used in databases and file systems. B-trees were invented in 1972 by Rudolf Bayer and Edward M. McCreight (Bayer & McCreight, 1972). During access operations, the entire index, in the form of a B-tree, must be kept in memory, although the data remains on disk. Thus, access via a B-tree, reduces how much of the actual data must be loaded into memory, which results in less memory being used and far fewer disk accesses (which are comparatively slow, although that distinction is becoming less pronounced as traditional mechanical disks are being replaced with solid-state storage devices

Managing Indexes in SQLite

For demonstration, let’s open an existing SQLite database.

## [1] 0

Creating an Index

Prior to using an index, the index must be explicitly created. We will create an index on the column LastName in the table customers. The full definition of the table is shown below (and is the result of a .schema customer SQLite operation):

CREATE TABLE IF NOT EXISTS "customers"
(
    [CustomerId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    [FirstName] NVARCHAR(40)  NOT NULL,
    [LastName] NVARCHAR(20)  NOT NULL,
    [Company] NVARCHAR(80),
    [Address] NVARCHAR(70),
    [City] NVARCHAR(40),
    [State] NVARCHAR(40),
    [Country] NVARCHAR(40),
    [PostalCode] NVARCHAR(10),
    [Phone] NVARCHAR(24),
    [Fax] NVARCHAR(24),
    [Email] NVARCHAR(60)  NOT NULL,
    [SupportRepId] INTEGER,
    FOREIGN KEY ([SupportRepId]) REFERENCES "employees" ([EmployeeId]) 
        ON DELETE NO ACTION ON UPDATE NO ACTION
);

An index is created with the CREATE INDEX operation.

CREATE INDEX custLN 
    ON customers (LastName);

The general form for CREATE INDEX is shown below:

CREATE [UNIQUE] INDEX index_name [ASC|DESC]
ON table_name(column_list);

Indexes are critical for non-prime columns when those columns are frequently in joins or searches. A B-Tree based index makes searching on equality (=) as well as ranges (< and >) significantly quicker. If a search is based on some specific single column, then an index on that column is warranted. However, if a search involves a compound key, i.e., several columns, then the index should be created on these columns – one index for all columns and not separate indexes on each column.

While indexes speed up searching dramatically, they are also useful in detecting duplicate values. The CREATE UNIQUE INDEX statement is an index for that purpose. It generally is implemented with a hashmap or a hashtable rather than a B-Tree.

Indexes are not useful when pattern matching searches are performed with LIKE.

An index is often used when an ORDER BY clause is present in a query as the in-order traversal of a B-Tree results in an ordered set of leaf nodes. When creating an index, each column name or expression can be followed by one of the “ASC” or “DESC” keywords to indicate sort order.

Using an Index

Indexes are only useful when a direct match search is done on an indexed column in the WHERE clause. SQLite, like all other relational databases, automatically uses an index when available and when the use of an index improves performance of a query.

SELECT LastName, FirstName
  FROM customers
 WHERE LastName = 'Johansson'

Table 1: 1 records
LastName	FirstName
Johansson	Joakim

The EXPLAIN QUERY PLAN operation can make the use of indexes explicit. The query plan below shows that the column is searched rather than scanned linearly, and that the index is used.

EXPLAIN QUERY PLAN
SELECT LastName, FirstName
  FROM customers
 WHERE LastName = 'Johansson'

Table 2: 1 records
id	parent	notused	detail
3	0	0	SEARCH customers USING INDEX custLN (LastName=?)

If an approximate search using LIKE is performed, then an index is not useful and the table is scanned row by row. The query plan below shows that the table is scanned and the index is ignored.

EXPLAIN QUERY PLAN
SELECT LastName, FirstName
  FROM customers
 WHERE LastName LIKE 'Johan%'

Table 3: 1 records
id	parent	notused	detail
2	0	0	SCAN customers

Note that SQLite, unlike many other databases, does not automatically create an index for the primary key, but it does build a hashmap on the primary to quickly determine uniqueness.

EXPLAIN QUERY PLAN
SELECT LastName, FirstName
  FROM customers
 WHERE CustomerId = 37

Table 4: 1 records
id	parent	notused	detail
2	0	0	SEARCH customers USING INTEGER PRIMARY KEY (rowid=?)

Deleting an Index

To delete an index, use the DROP INDEX statement.

DROP INDEX custLN;

The general form for CREATE INDEX is shown below:

DROP INDEX [IN EXISTS] INDEX index_name;

Dropping an index is often advantageous when inserting a large number of rows as might be the case during bulk loading of new data or re-building an OLAP database. It is often faster to drop the index, perform the insertions, and then create the index anew.

Listing Indexes

To list all of the created and available indexes for some specific, use one the commands below:

PRAGMA list_indexes('customers')

PRAGMA index_list('customers')

Table 5: 2 records
seq	name	unique	origin	partial
0	custLN	0	c	0
1	IFK_CustomerSupportRepId	0	c	0

Another way to get all indexes from a database is to query from the sqlite_master table:

SELECT
   `type`, 
   `name`, 
   `tbl_name`, 
   `sql`
  FROM sqlite_master
WHERE `type` = 'index';

Table 6: Displaying records 1 - 10
type	name	tbl_name	sql
index	sqlite_autoindex_playlist_track_1	playlist_track	NA
index	IFK_AlbumArtistId	albums	CREATE INDEX [IFK_AlbumArtistId] ON “albums” ([ArtistId])
index	IFK_CustomerSupportRepId	customers	CREATE INDEX [IFK_CustomerSupportRepId] ON “customers” ([SupportRepId])
index	IFK_EmployeeReportsTo	employees	CREATE INDEX [IFK_EmployeeReportsTo] ON “employees” ([ReportsTo])
index	IFK_InvoiceCustomerId	invoices	CREATE INDEX [IFK_InvoiceCustomerId] ON “invoices” ([CustomerId])
index	IFK_InvoiceLineInvoiceId	invoice_items	CREATE INDEX [IFK_InvoiceLineInvoiceId] ON “invoice_items” ([InvoiceId])
index	IFK_InvoiceLineTrackId	invoice_items	CREATE INDEX [IFK_InvoiceLineTrackId] ON “invoice_items” ([TrackId])
index	IFK_PlaylistTrackTrackId	playlist_track	CREATE INDEX [IFK_PlaylistTrackTrackId] ON “playlist_track” ([TrackId])
index	IFK_TrackAlbumId	tracks	CREATE INDEX [IFK_TrackAlbumId] ON “tracks” ([AlbumId])
index	IFK_TrackGenreId	tracks	CREATE INDEX [IFK_TrackGenreId] ON “tracks” ([GenreId])

Considerations

Searching using substring matching or the keyword LIKE cannot take advantage of available indexes and should be avoided as they will involve linear table scans.
When new data is inserted or data is updated or deleted, indexes on the table where the modification occurred must be re-arranged which can be time consuming. The worse case run-time behavior of updating a B-Tree is \(O(n^2)\) which can lead to significant performance reduction.
Consequently, indexes should be dropped prior to loading data in bulk during ETL as they significantly reduce insertion performance. After ETL, indexes must be created again.
Searching on expressions does not use indexes unless a specific index on the expression is created

Experiments

Experiment 1: Searching

This experiment looks at the difference in retrieval performance with and without an index on a large table.

Experiment 2: Insertion

This experiment looks at the performance degradation when inserting new data into a table with an index versus no index.

Experiment 3: Sorting

Indexes also improve sorting, so this experiment will look at that benefit of indexes.

Summary

Indexing is an important performance improvement option for large tables where searches are performed frequently on non-prime attribute and a reduction in write performance is acceptable. SQLite offers two options for indexes: one for efficient searching and another for duplicate detection.

Files & Resources

All Files for Lesson 60.611

References

Bayer, R. and McCreight, E. (1972). Organization and maintenance of large ordered indexes. Acta Informatica, 1(1), 73-189.

SQLite Indexing. SQLite Tutorial. Indexes on Expressions

Errata

Let us know.

60.611
Essential Guide to Indexing in SQLite

Martin Schedlbauer, PhD

2024-02-14

Objectives

Introduction

Managing Indexes in SQLite

Creating an Index

Using an Index

Deleting an Index

Listing Indexes

Considerations

Experiments

Experiment 1: Searching

Experiment 2: Insertion

Experiment 3: Sorting

Summary

Files & Resources

References

Errata

60.611Essential Guide to Indexing in SQLite

Martin Schedlbauer, PhD

2024-02-14

Objectives

Introduction

Managing Indexes in SQLite

Creating an Index

Using an Index

Deleting an Index

Listing Indexes

Considerations

Experiments

Experiment 1: Searching

Experiment 2: Insertion

Experiment 3: Sorting

Summary

Files & Resources

References

Errata

60.611
Essential Guide to Indexing in SQLite