60.108 Logical Relational Modeling Data with E-R Diagrams

Introduction

Logical data modeling is a key step in the design and development of databases and information systems. It follows conceptual modeling and precedes physical modeling in the database design process. Logical data modeling involves the detailed definition of the data elements, structures, and relationships within a system, without being tied to a specific database management system (DBMS) technology. This modeling phase aims to translate the high-level, technology-agnostic concepts identified during conceptual modeling into a more detailed and structured representation of data, which can later be implemented in a specific DBMS.

Before moving into the rest of this lesson, you may wish to watch the tutorial first rather than after reading the lesson.

Logical Modeling Process

Here’s an overview of the key aspects of logical data modeling:

Key Components

Entities: These represent real-world objects or concepts, similar to classes in conceptual modeling. In logical data modeling, entities are defined with more detail, including attributes, data types, and keys.
Attributes: Detailed characteristics or properties of entities. Each attribute is defined with a specific data type and constraints. Attributes in logical modeling are more granular and precise compared to conceptual modeling.
Primary Keys: Unique identifiers for entities. A primary key’s main role is to enforce entity integrity by uniquely identifying each record in an entity.
Foreign Keys: These establish relationships between entities. A foreign key in one entity points to a primary key in another entity, enforcing referential integrity.
Normalization: A process applied during logical data modeling to reduce data redundancy and improve data integrity. It involves decomposing entities into smaller entities based on rules designed to minimize duplication and potential inconsistencies.
Relationships: Logical data models detail the types of relationships between entities, such as one-to-one, one-to-many, or many-to-many, and define how entities are related and interact with each other.

Process

Translation of Conceptual Model: The process begins with the translation of the high-level conceptual model into a more detailed logical model. This involves defining entities and relationships with greater precision.
Normalization: To ensure data integrity and reduce redundancy, the model is normalized. This often involves creating additional entities and adjusting relationships to adhere to normalization rules (typically up to the third normal form for most applications).
Definition of Attributes: Each entity is detailed with specific attributes, including data types and constraints that were not specified in the conceptual model.
Key Assignment: Primary keys are assigned to uniquely identify records within each entity, and foreign keys are defined to establish relationships between entities.

Purpose and Benefits

Technology Agnostic: Logical data models are designed to be independent of any specific database technology, focusing on the structure of data rather than how it will be physically stored.
Foundation for Physical Modeling: It serves as a blueprint for the physical model, where the logical model’s structures are implemented in a specific DBMS, taking into account the physical storage and performance considerations.
Improves Data Quality: Through normalization and detailed definition of entities and relationships, logical data modeling helps in improving data quality and integrity.
Facilitates Communication: Provides a detailed schema that can be used to communicate the structure and rules of the database among developers, analysts, and stakeholders.

Logical data modeling is a vital step in ensuring that the data strategy aligns with business requirements and objectives, facilitating the development of databases that are structured, efficient, and scalable.

Modeling Approaches

Logical data models can be expressed using a variety of languages or notations, each designed to facilitate the detailed design of database schemas, including entities, relationships, and constraints. These languages help in the translation of a logical data model into a physical database schema that can be implemented using a specific database management system (DBMS). Here are some of the most common languages and notations used for expressing logical data models:

1. SQL Data Definition Language (DDL)

Description: SQL (Structured Query Language) DDL is a standard language used to define and modify database structures. It is used for creating, altering, and deleting database objects such as tables, indexes, and constraints.
Usage: DDL statements like CREATE TABLE, ALTER TABLE, and CREATE INDEX are used to implement the entities, relationships, and other aspects of a logical data model in a relational database.

2. Entity-Relationship Diagrams (ERDs)

Description: ERDs are a graphical representation of entities, their attributes, and the relationships between entities. While not a “language” in the strict sense, ER diagrams are a widely used notation for designing and communicating logical data models.
Usage: ERDs help in visualizing the structure of a database at a logical level, making it easier for stakeholders to understand and review the model before it is implemented in a DBMS.

3. Unified Modeling Language (UML)

Description: UML is a general-purpose, developmental, modeling language in the field of software engineering that is used to provide a standard way to visualize the design of a system. UML includes a variety of diagrams, with class diagrams being particularly useful for expressing logical data models.
Usage: UML class diagrams can represent entities (classes), attributes, relationships (associations), and operations (methods) that may translate into database actions.

4. XML Schema

Description: XML Schema defines the structure of XML documents. Although primarily used for XML document validation, XML Schema can also be used to express the logical structure of data that will be stored in XML format.
Usage: It is used when the data model is intended to be implemented in systems that leverage XML for data interchange or storage.

5. JSON Schema

Description: Similar to XML Schema, JSON Schema is a vocabulary that allows you to annotate and validate JSON documents. It defines the structure and data types of JSON data.
Usage: JSON Schema is used for logical data modeling when the data interchange or storage format is JSON, particularly in web applications and services.

6. Data Modeling Tools Specific Languages

Description: Many data modeling tools offer their own languages or graphical notations for defining logical data models. These tools often generate SQL DDL or other output formats that can be used to implement the model in a database.
Usage: These proprietary languages or notations are used within specific data modeling tools to facilitate the design, documentation, and generation of database schemas.

Each of these languages and notations serves different purposes and may be chosen based on the specific requirements of the project, the type of database being designed (relational, XML, JSON-based, etc.), and the preferences or policies of the organization.

This lesson will explain the ERD approach in more detail and show how to build ER Diagrams to visualize logical data models. While UML Class Diagrams are often preferable for logical modeling as they are also commonly used in the prior conceptual modeling step, many organization use ERDs, so knowing how to read and construct ERDs is an essential skill of the data architect.

Entity-Relationship Diagrams (ERD)

An Entity-Relationship Diagram (ERD) is a graphical representation used in database design to illustrate the relationships and entities within an information system. ERDs help in modeling data structures, thereby providing a clear picture of the types of information stored, how data can be grouped, and how different data groups relate to each other. This modeling tool is essential in the conceptual and logical phases of database design, serving as a blueprint for developing the database schema.

Key Components of an ERD

Entity: An entity represents a real-world object or concept that can be distinctly identified. In an ERD, entities are typically shown as rectangles and represent database tables in the physical database.
Attribute: Attributes are the properties or characteristics of an entity. They are shown as ovals connected to their respective entities, representing the columns or fields of a table in the database.
Relationship: Relationships illustrate how entities interact with each other within the system. They are depicted as diamonds or lines connecting entities and indicate the nature of the association between entities (e.g., one-to-one, one-to-many, many-to-many).
Primary Key: A primary key is a special attribute that uniquely identifies each record in a database table. While not always visually distinct in ERDs, primary keys are crucial for ensuring data integrity.
Foreign Key: A foreign key is an attribute in one table that links to the primary key of another table, establishing a relationship between the two tables. Like primary keys, foreign keys are essential for defining relationships in databases but may not always be visually indicated in ERDs.

Types of Relationships in an ERD

One-to-One (1:1): Indicates that a record in one entity is related to only one record in another entity, and vice versa.
One-to-Many (1:N): Shows that a record in one entity can be associated with multiple records in another entity, but a record in the second entity can only be associated with one record in the first entity.
Many-to-Many (M:N): Signifies that records in one entity can be associated with multiple records in another entity and vice versa. This relationship often requires a join table (or associative entity) to implement in relational databases.

Uses of ERDs

Database Design: ERDs are used to design and structure databases before they are built, helping to identify and organize data requirements and business rules.
Documentation: They serve as documentation tools, making it easier for developers, analysts, and stakeholders to understand the database structure and data relationships.
Problem-Solving: ERDs can be used to analyze and solve data-related problems by visually representing the entities involved and how they interact.

ERDs are fundamental in the database design process, enabling a systematic approach to database development that ensures data consistency, integrity, and efficiency. They provide a visual interpretation that is easier to understand and manipulate, facilitating better communication among team members and stakeholders involved in the system development process.

Elements of an ERD

An Entity-Relationship Diagram (ERD) is a visual representation of entities (data objects) within a domain and the relationships between these entities. ERDs are used in database design and modeling to specify data structures and their connections The primary elements that make up an ERD include:

entities with their attributes, attribute data types, and keys
relationships with multiplicity constraints

The following sections illustrate the Information Engineering (IE) notation for entity-relationship diagrams that is also colloquially referred to as the “Crow’s Foot” or the “Crow’s Feet” notation. Crow’s Foot Notation is a widely used graphical notation for depicting the relationships between entities in an entity-relationship diagram (ERD). This notation is named for its resemblance to a crow’s foot at the end of the lines that denote the “many” sides of relationships, providing a visual way to represent cardinality and connectivity between entities. Crow’s Foot Notation is favored for its intuitive representation of complex database structures, making it easier for designers and developers to understand and communicate database schemas.

Entity

An entity represents a real-world object or concept that can have data stored about it. In database terms, an entity typically translates into a table. Entities are depicted as rectangles in an ERD containing two compartments and up to three columns in the second compartment.

Weak Entity

A weak entity is an entity that cannot be uniquely identified by its attributes alone and typically relies on a foreign key relationship with another, stronger entity.

Attributes

Attributes are the properties or details that describe or qualify an entity. They represent data points that are stored for each entity and usually become columns in a database table.

Simple Attributes

Simple attributes are attributes that have a single value and cannot be decomposed further.

Composite Attributes

Composite attributes are attributes that can be divided into smaller sub-parts, which represent more basic attributes with independent meanings. For example, the attribute “name” for an entity “Student” can generally be further decomposed into first name, last name, and middle name.

Multi-valued Attribute

A multi-valued attribute can have more than one value for a single entity instance. For an entity to be in first normal form (1NF), it cannot have any multi-valued attributes.

Derived Attribute

Derived attributes are attributes whose values can be calculated or derived from other attributes. For example, the attribute “GPA” (Grade Point Average) for the entity “Student” can be calculated from the grades earned by the student in each course in which they were enrolled. Whether a derived attribute is stored or calculated when needed is an implementation detail.

Categorical Attribute

A categorical attributes takes its value from a pre-defined set of values. For example, the values for the attributes “semester” in an entity representing a university course might be drawn from the value set {Fall,Spring}. The values in the value set are domain dependent.

Attribute Data Types

The data type of an attribute is the set of values that the attribute can take on. Generally, a logical data model should include higher level data types, although it often contains data types supported by the database or implementation target. For example, in a conceptual data model we might use TEXT, but in a logical model it would be reasonable to use varchar(64).

Keys

In an Entity-Relationship Diagram (ERD), keys are critical concepts used to ensure the integrity and efficiency of data within a database system. They play a vital role in identifying records uniquely and establishing relationships between entities. There are several kinds of keys in an ERD, each serving a specific purpose:

Primary Key

A primary key is an attribute or a combination of attributes that uniquely identifies each record within an entity (or table). No two records can have the same primary key value, the value must be immutable, and it cannot be null. In an ERD, the first column contains the letters “PK” for any attribute that is part of the primary key.

Foreign Key

A foreign key is an attribute in one entity that links to the primary key of another entity. It is used to establish and enforce a link between the data in two tables. It maintains referential integrity by ensuring that the relationship between entities is consistent. In an ERD, the first column contains the letters “FK” for any attribute that is part of a foreign key.

Composite Key

A composite key is formed by combining two or more attributes to uniquely identify a record in an entity. This is used when no single attribute can uniquely identify records within an entity. It ensures uniqueness where individual attributes do not guarantee uniqueness on their own. A composite key is recognized by more than one attribute having a “PK” designation in the first column.

Candidate Key

A candidate key is an attribute, or a set of attributes, that can qualify as a primary key. That is, it can uniquely identify records in an entity. An entity can have more than one candidate key. From the candidate keys, one is selected as the primary key, while others can serve as alternate keys for unique identification.

Alternate Key

An alternate key is a candidate key that was not chosen to be the primary key. It is also capable of uniquely identifying records within an entity. It provides alternative ways to uniquely identify a record, which can be particularly useful in queries and when the primary key is not known. In an ERD, the first column contains the letters “AK” for any attribute that is part of an alternate key.

Superkey

A superkey is a set of one or more attributes that, taken collectively, can uniquely identify a record in an entity. A superkey may contain attributes that are not necessary for unique identification. It guarantees the uniqueness of records, but is not minimal (it may have excess attributes beyond what is needed to ensure uniqueness). By definition, the primary key is a minimal superkey. We do not show superkeys in an ERD.

Surrogate Key

A surrogate key (also sometimes called a synthetic key) is an artificial (i.e., “made up”) attribute assigned to an entity as its primary key. Surrogate keys have no business meaning and are usually auto-incremented numbers. They are used when a natural primary key is not available or practical. Surrogate keys simplify database design and maintain integrity by providing a unique identifier that is not subject to change.

These keys are fundamental in designing a relational database, ensuring data accuracy, and establishing the relational links between entities in an ERD. By defining and implementing these keys properly, database designers can create efficient, reliable, and scalable data models.

Relationships

Relationships illustrate how entities interact with each other and the nature of their association. They describe how entities share data in the database structure. Relationships are depicted as lines connecting entities. The ends of the lines contain multiplicity adornments. The relationship line can also, optionally, contain a label explaining the nature of the relationship.

Multiplicities

Multiplicity specifies the numerical relationship between instances of one entity to instances of another entity. It indicates the number of instances of one entity that can or must be associated with each instance of another entity. Multiplicity is often indicated by placing numbers or symbols near the entities in a relationship to show one-to-one (1:1), one-to-many (1:N), or many-to-many (M:N) associations. The symbols used in ERDs are shown below.

The relationship line should come out of the FK and go into the PK attributes. A lower bound of 0 indicates that the relationship is optional. Unlike in a UML Class Diagram, there are no other bounds than 0, 1, or many.

Note that an ERD that is mapped to a relational model cannot contain any many-to-many relationships. Any such relationships must be resolved into two separate one-to-many relationships and a junction (also sometimes called a join, association, or associative) entity.

Tutorial

In this video tutorial, Khoury Boston’s Prof. Schedlbauer looks at the foundation of information architecture and database design: conceptual and logical data modeling. It summarizes the concepts introduced before.

Worked Example

Conceptual to Logical Model

The data modeling process and the design of a database starts with a conceptual model. The conceptual model is generally visualized as a UML Class Diagram, but could also be described in Structured English. A UML Class Diagram is typically more accurate and more closely models the domain. UML has substantially more modeling constructs than an ERD. For instance, UML Class Diagrams contain four different kinds of relationships (association, aggregation, composition, and generalization) versus the ERDs sole relationship connector. UML Class Diagrams also model multiplicities more precisely by allowing any kind of lower and any kind of upper bound versus the ERDs lower bounds of 0 or 1 and upper bounds of 1 or many.

Below is a UML Class Diagram representing a partial conceptual domain model showing sales transactions for cars. A car is a kind of vehicle and is purchased in a sale by a buyer.

Each class in the UML Class Diagram becomes an entity in the ERD. We initially will maintain the same relationships and adjust the multiplicities to the correct symbols with the caveat that lower bounds other than 0 or 1 cannot be translated and that upper bounds with specific bounds cannot be represented in an ERD. Aggregation and composition relationships are translated a simple participation connections as they are not representable in an ERD. To translate a generalization, we commonly collapse the generalization hierarchy into a single entity and add all subclass and superclass attributes as attributes of the collapsed entity with a new discriminator column.¹ Many-to-many relationship multiplicities are resolved with a junction entity. Categorical attributes are implemented (most often) through lookup tables. Multi-valued attributes are moved to a separate entity. Finally, we add foreign keys to implement the relationships.

In the ERD above, the attribute “type”of the entity Sale is the subclass name, so for an instance of the class Car it would be the text value “car”. The many-to-many relationship between Buyer and Sale is resolved via the SalesTxn junction entity containing a surrogate key (which is not strictly necessary as the primary key could also be the combination of {bid,sid}, but a single attribute for a key is simpler).

The video below walks through the translation process and the considerations that one often needs to take into account.

Errata: The relationship between “Sale” and “SaleType” should be many-to-one as illustrated above. The diagram in the video has a typographical error and incorrectly depicts the multiplicity.

Translation Process

Translating a conceptual model expressed as a UML Class Diagram to a logical model as a Crow’s Foot ERD involves several steps. This process bridges the gap between abstract conceptualization of the system and its practical database design, ensuring that the theoretical foundations are accurately reflected in the logical structure of the database. Here’s a step-by-step guide to facilitate this translation:

1. Identify Classes and Entities

From UML to ERD: Identify all classes in the UML Class Diagram. Each class typically translates to an entity in the ERD. The class name becomes the entity name.
Attributes: Convert the attributes of each UML class into attributes of the corresponding ERD entity. Pay attention to the data types and constraints (such as mandatory or optional) that might be implied or explicitly stated in the UML diagram.

2. Determine Primary Keys

For Each Entity: Determine the primary key for each entity in the ERD. In the UML diagram, this could be indicated by attributes that uniquely identify instances of a class. If not explicitly defined, choose an attribute (or a combination of attributes) that can serve as a unique identifier.
Use Surrogate Keys if Necessary: If no natural primary key is apparent, consider introducing a surrogate key (an artificial identifier) in the ERD.

3. Translate Relationships

Identify Relationships: Examine the associations between classes in the UML diagram. These will be translated into relationships in the ERD.
Determine Cardinality: For each relationship, determine the cardinality (one-to-one, one-to-many, many-to-many) and optionally the participation (total or partial). Use Crow’s Foot notation to represent these relationships accurately.
Foreign Keys: For one-to-many relationships, identify which entity will hold the foreign key. This is typically on the “many” side of the relationship, referencing the primary key of the “one” side.

4. Handle Special Class Relationships

Aggregation and Composition: These relationships need careful translation. For aggregation, consider whether it necessitates a foreign key in the ERD or a separate associative entity for implementation. For composition, ensure that the lifecycle dependency is noted, potentially affecting how deletion operations are handled.
Generalization and Specialization (Inheritance): Translate inheritance relationships into separate entities, possibly with foreign keys to maintain the relationship, or use a single table with a discriminator column, depending on the specifics of the design and requirements.

5. Convert Multiplicities

Multiplicity to Symbols: Convert the multiplicities defined in UML (e.g., 0..1, 1, 0.., 1..) into the appropriate cardinality symbols in Crow’s Foot notation. Ensure that optional and mandatory participations are correctly represented.

6. Review and Refine

Consistency Check: Review the translated ERD for consistency with the UML Class Diagram. Ensure that all entities, attributes, and relationships are correctly represented and that the model accurately reflects the intended data structure.
Normalization: Apply normalization rules to the ERD to ensure that the database design is efficient, avoids redundancy, and ensures data integrity.

7. Document and Validate

Documentation: Document the logical model, including entity descriptions, attributes, primary keys, foreign keys, and relationships.
Validation: Validate the logical model against requirements. Engage stakeholders to review the ERD, ensuring it meets business and data requirements.

Translating from a UML Class Diagram to a Crow’s Foot ERD involves careful consideration of the nuances between conceptual and logical modeling. This process not only translates the structure and relationships but also adapts them to the conventions and requirements of database design, ensuring that the logical model serves as a robust foundation for the physical database implementation.

There are modeling constructs in a UML Class Diagram that do not translate directly to an ERD, so we may often have to devise implementation constructs or not represent some directly.

Summary

Logical data modeling is essential for database design and schema implementation. The “Crow’s Foot” notation is a common visual notation for entity-relationship diagrams.

Files & Resources

References

None.

Resources

LucidChart Models

Errata

Let us know.

There is an alternative mechanism using foreign keys.↩︎

60.108Logical Relational Modeling Data with E-R Diagrams

Martin Schedlbauer, PhD

2025-02-04