Introduction

The database design process is a structured approach that involves several stages, each crucial for ensuring that the final database system meets the requirements of the users and supports the data needs of the application it underpins. This process can be broadly divided into three main phases: conceptual, logical, and physical modeling. Each phase serves a distinct purpose and builds upon the previous one to progressively refine and detail the database structure and functionality.

Database Design Workflow

The overall database design workflow is illustrated in the flow diagram below:

Explanation of the steps:

  1. Requirements Analysis: The first stage involves gathering and analyzing the requirements. This step is crucial for understanding what data needs to be stored and how it will be used, which informs the entire design process. A common requirements analysis method is use case analysis which also helps bound the scope of the design process.

  2. Conceptual Modeling: Commonly using UML Class Diagrams and/or Structured English, this phase focuses on creating a conceptual model of the domain. It’s a high-level representation, showing entities, relationships, and key attributes without delving into database-specific details. The conceptual model also serves as a visualization of the domain ontology.

  3. Logical Modeling: In this phase, the conceptual model is translated into an Entity-Relationship Diagram (ERD). The ERD provides a more detailed view of how data is interconnected, including primary and foreign keys, entity relationships, and multiplicity.

  4. Physical Modeling: This step converts the logical model into a physical model, specifying the actual structure of the database schema in a relational database. It includes table structures, columns, data types, constraints, and indexes.

  5. Database Implementation: The final step is where the physical model is implemented as a working database. This involves creating the database using a database management system (DBMS), defining tables, relationships, and constraints as per the relational schema, and then populating the database with data.

This workflow outlines a structured and methodical approach to database design, ensuring that the database is well-organized, scalable, and effectively meets the users’ requirements.

Design Workflow Phases

1. Conceptual Modeling

  • Purpose: The goal of conceptual modeling is to establish a high-level representation of the information that the system will manage, without delving into how the data will be stored or implemented technically. This phase focuses on identifying the key entities (or objects) within the domain, the relationships between these entities, and their attributes.
  • Process: Conceptual modeling often involves creating an Entity-Relationship Diagram (ERD) or using Unified Modeling Language (UML) class diagrams to visually represent the entities, relationships, and key attributes. This model is technology-agnostic and designed to be understood by stakeholders with various levels of technical knowledge.
  • Outcome: A conceptual model provides a clear and simplified view of the domain, facilitating communication among stakeholders (including business analysts, domain experts, and end-users) and forming the foundation for the more detailed design phases.

2. Logical Modeling

  • Purpose: The logical modeling phase refines the conceptual model into a more detailed blueprint that specifies the structure of the data as it will be managed within the system. This phase involves defining the exact data structures, including entities, relationships, keys (primary and foreign keys), and attribute data types, while still not being committed to a particular database management system (DBMS).
  • Process: During logical modeling, the conceptual schema is transformed into a logical schema, which includes specifying tables, columns, data types, constraints (e.g., uniqueness, mandatory fields), and the relationships among tables. Normalization is often performed in this phase to reduce data redundancy and improve data integrity.
  • Outcome: The result is a detailed, DBMS-independent model that clearly outlines the structure and constraints of the data, ensuring that it is organized logically and efficiently.

3. Physical Modeling

  • Purpose: Physical modeling takes the logical model and translates it into a design that can be implemented in a specific DBMS. This phase considers the physical storage, indexing strategies, access paths, and optimization techniques that are best suited to the database’s operational requirements.
  • Process: It involves defining physical tables, indexes, partitioning strategies, and other storage parameters. Decisions made in this phase are influenced by factors such as expected load, performance requirements, and the specific features and limitations of the chosen DBMS.
  • Outcome: The physical model is a fully detailed blueprint for the database implementation, including all the necessary scripts and definitions to create the database structure, ready for data to be inserted and used in a real-world environment.

Iterative Nature and Validation

  • Iterative Design: Database design is often iterative, with feedback loops allowing for refinement at each stage. Changes in requirements or discoveries made during later stages of modeling may necessitate revisiting earlier models for adjustments.
  • Validation: At each phase, it’s crucial to validate the models against the system requirements and with stakeholders to ensure the database will meet the intended goals and performance criteria.

Tutorial: ER Diagrams

In this short video tutorial below, Khoury Seattle’s Prof. Chhay provides an overview of data modeling and representing data models visually in an ERD (Entity-Relationship Diagram). In his tutorial he presents the classic (and original) E-R diagramming approach developed by Dr. Peter Chen. He then moves into the common “Crow’s Feet” notation.

In practice, UML Class Diagrams are the preferred way to build data models, although in many organization “Crow’s Feet” notation is still very popular. Thus, we need to study both.

The entity-relationship (E-R) model is the most popular conceptual model used for designing a database. It was originally proposed by Dr. Chen in 1976 as a way to unify the network and relational database views. The E-R model views the real world as a set of basic objects (known as entities), their characteristics (known as attributes), and associations among these objects (known as relationships). The entities, attributes, and relationships are the basic constructs of an E-R model.

The information gathered from the user forms the basis for designing the E-R model. The nouns in the requirements specification document are represented as the entities, the additional nouns that describe the nouns corresponding to entities form the attributes, and the verbs become the relationships among various entities.

Building an E-R model is an important step in conceptual modeling as it presents a simple and easy to understand definition of the information objects of a business domain and thus becomes an effective way for database designers, business analysts, and subject domain experts as well as users to communicate. An E-R model represents the actual information objects in a simple and meaningful way that can be easily mapped into a relational model for implementation in a relational database. Entities correspond to tables, attributes become columns in the tables. Relationships become primary and foreign key links.

An Entity-Relation Diagram (ERD) is any diagram that shows the entities in a domain, their attributes, and their relationships. There are four major visual notations with which to express an ERD: Chen, IE (Crow’s Feet), IDEF1X, and UML Class Diagrams. Commonly, the first three are referred to as ERD notations while the UML Class Diagram is actually an object-oriented notation. Much of the choice of notation depends on what a tool supports. For example, Microsoft Access supports IDEF1X, while TOAD supports IE. On the other hand, LucidChart and Enterprise Architect – both tools support IE and UML.

Summary

In summary, the database design process, through its conceptual, logical, and physical phases, systematically transforms high-level data concepts into a concrete, optimized database structure. This structured approach ensures that the database supports the required data, relationships, and integrity constraints efficiently and effectively, aligning with both the business objectives and technical requirements.

The design of a database starts with a conceptual model that is database and implementation agnostic, moves to a logical model that is database model specific, and then to a physical model that is specific to the capabilities of a database.

See Also

  • 60.108 ┆ Model Relational Data with E-R Diagrams
  • 60.109 ┆ Data Modeling Demonstration with ERD
  • 60.110 ┆ The Relational Model and Relational Schema Design
  • 30.152 ┆ Domain Modeling with UML Class Diagrams

Files & Resources


References

None.

Resources

LucidChart Models

Errata

Let us know.