Skip to main content

What is Third Normal Form (3NF)?

Learn how third normal form (3NF) can help you organize your databases more efficiently by removing redundancy and minimizing dependency issues. See how decomposing tables can simplify data management.
Nov 18, 2024  · 9 min read

Imagine working with a massive, unstructured database filled with repeated, redundant information. Every update or deletion becomes a potential disaster, risking errors and inconsistencies. Third normal form (3NF) is a proven database normalization method to avoid this chaos. Implementing 3NF cleans up your data structure, ensuring it’s efficient, organized, and free from unnecessary redundancies.

In this article, we'll explore how 3NF works, why it's valuable, and how you can put it into practice. We’ll also compare 3NF to other forms and learn when to use each. Everyone can benefit from learning more about these structures, but this knowledge is particularly valuable if you are a database designer or data scientist, as it can significantly simplify your work and keep your databases reliable. If you are interested in database design as a whole, take a look at our full Database Design course.

Third Normal Form (3NF) Definition

Third normal form is a key concept of database normalization that removes unwanted dependencies. 3NF builds upon first normal form (1NF) and second normal form (2NF), meaning that it inherits their rules: 1NF requires atomic (indivisible) values in each cell, and 2NF removes partial dependencies on a composite primary key. 3NF takes it further by removing transitive dependencies, a situation where non-key attributes depend indirectly on the primary key.

By focusing on this, 3NF ensures that each non-key column in a table is directly tied to the primary key and nothing else. In more practical terms, 3NF helps minimize redundancy and avoid anomalies when inserting, updating, or deleting data.

In the 1970s, Edgar F. Codd introduced 3NF to formalize the conditions for achieving a fully normalized database structure. A reformulation by Carlo Zaniolo a few years later provided a clearer explanation of the difference between the “classic” 3NF and the more restrictive Boyce-Codd normal form (BCNF). Don’t worry too much about BCNF for now, we will come back to it further down.

Understanding the Conditions for Third Normal Form

So, what exactly does it take to achieve 3NF? For a table to qualify, it needs to meet a few conditions:

  • Be in 2NF: This means it’s already atomic, with no repeating groups and no partial dependencies on any composite keys.

1NF, 2NF, and 3NF requirements

3NF is inclusive of 2NF and 1NF. Image by Author

  • No transitive dependencies: This rule is key. In a 3NF table, any non-primary-key column must depend solely on the primary key, not indirectly through another non-key column.

Let’s have a look at what that practically means.

Decomposing Tables to Achieve 3NF

Let’s walk through the process of decomposing tables to reach 3NF. We'll use some sample data from DataCamp courses to illustrate each step.

Step 1: Identify transitive dependencies

To start, we’ll look for any attributes in a table that indirectly depend on the primary key. As a rule of thumb, if any attribute depends on something other than the primary key, this indicates a transitive dependency. That’s a sign that it might be time to split up your table.

Take a look at the three tables below. Which one has a transitive dependency?

Table 1: Course

Course ID Course Name Difficulty
201 SQL Fundamentals Beginner
202 Introduction to Python Beginner
203 Understanding Data Science Intermediate

Table 2: Instructor

Instructor ID Instructor Name Expertise
1 Sarah Johnson Data Science
2 Tom Williams Machine Learning
3 Emily Brown Python

Table 3: Enrollments

Enrollment ID Student Name Course ID Course Name
1001 Alice Smith 201 SQL Fundamentals
1002 Bob Green 202 Introduction to Python
1003 Charlie Blue 201 SQL Fundamentals

The answer is… Table 3!

In this table, Course Name depends on Course ID, but not directly on Enrollment ID (the primary key). This indirect dependency makes Course Name a transitive dependency.

Step 2: Separate data into new tables

To address the transitive dependency, we’ll split Table 1 into two tables. Each table will focus on directly dependent data.

Revised enrollments table

Enrollment ID Student Name Course ID
1001 Alice Smith 201
1002 Bob Green 202
1003 Charlie Blue 201

Courses table

Course ID Course Name
201 SQL Fundamentals
202 Introduction to Python

Now, each table contains only information that directly depends on its primary key: Course ID is now the primary key for Course Name in the Courses table, and Enrollment ID is the primary key in the Enrollments table.

With this decomposition, the tables now meet 3NF requirements, eliminating redundancy and ensuring each table stores only directly relevant information.

If you want to get hands-on and create your own databases, have a look at our Creating PostgreSQL Databases course. If you are a little more advanced, you could try Introduction to Data Modeling in Snowflake, which covers ideas like entity-relationship and dimensional modeling.

Benefits and Limitations of Using Third Normal Form

So, why go through all this effort to reach 3NF? Here are the main perks:

  • Improved Data Integrity: By eliminating transitive dependencies, 3NF helps ensure that updates and deletions don’t lead to conflicting or outdated data across tables.
  • Reduced Redundancy: Less redundancy means your database is easier to maintain, and storage use is reduced.
  • Simpler Data Maintenance: Keeping similar information in dedicated tables makes it easier to update records without tracking down redundant entries.

That said, while 3NF structures support data accuracy, they can also lead to more segmented data, sometimes making complex queries slower due to additional table joins. In cases where the need for speed trumps the need for normalization, BCNF or 4NF may be more practical options.

Comparison: First, Second, Third, and BC Normal Forms

Let’s have a look at form differences.

Comparison table: first, second, and third normal forms

Here is a comparison table to help you understand the requirements of 1NF, 2NF, and 3NF.

Feature 1NF 2NF 3NF
Atomic data
No partial dependencies
No transitive dependencies

Third normal form vs. Boyce-Codd normal form (BCNF)

BCNF is a “stricter” form of 3NF that further eliminates anomalies that arise with overlapping candidate keys. It can be especially useful in complex cases where 3NF alone doesn’t fully eliminate dependencies. BCNF applies when a non-prime attribute depends on an attribute that is part of a composite candidate key. I know that sounds complex, so let’s break it down with an example.

Current structure (in 3NF)

After decomposition to achieve 3NF, we had these two tables:

Enrollments table

Enrollment ID Student Name Course ID
1001 Alice Smith 201
1002 Bob Green 202
1003 Charlie Blue 201

Courses table

Course ID Course Name
201 SQL Fundamentals
202 Introduction to Python

In this structure, each table is in 3NF with no transitive dependencies, and data is appropriately normalized.

Introducing a new requirement

Now, let’s add a new attribute to Courses: the Classroom in which each course is held. This new attribute could result in a scenario that requires BCNF.

Updated courses table (3NF)

Course ID Course Name Classroom
201 SQL Fundamentals Room 101
202 Introduction to Python Room 102
203 Understanding Data Science Room 101

Here, Course ID is still the primary key, and all other attributes depend directly on it. But let’s assume there’s a new rule that each classroom can hold only one subject at a time. Let’s also suppose the Course Name "SQL Fundamentals" could be offered under different Course IDs (like 201, 204, etc.), if they were scheduled at different times. In that case, each offering of "SQL Fundamentals" would still take place in "Room 101," regardless of the specific Course ID. As a result, Course Name also uniquely determines Classroom.

This means we now have two candidate keys:

  1. Course ID
  2. Course Name

With both candidate keys, we now have an issue that 3NF does not address: Classroom depends on Course Name rather than just Course ID.

Applying BCNF

To eliminate this dependency issue, we’ll need to further decompose the Courses table into two separate tables that better align with BCNF:

  1. A new Courses table, which only includes the Course ID and Course Name.
  2. A CourseDetails table, which stores the Course Name and Classroom association.

Here’s how this looks:

Revised courses table (BCNF)

CourseDetails table (BCNF)

Course Name Classroom
SQL Fundamentals Room 101
Introduction to Python Room 102
Understanding Data Science Room 101

With this new structure, each table satisfies BCNF conditions:

  • In the Courses table, Course ID is the primary key, and all attributes depend solely on it.
  • In the CourseDetails table, Course Name is the primary key, and Classroom depends only on Course Name.

This setup removes any dependency issues caused by overlapping candidate keys, ensuring a strictly normalized structure.

Conclusion

Third normal form is a valuable tool for database designers aiming to keep data clean, consistent, and free from problematic dependencies. With 3NF, data integrity is enhanced, making management smoother and reducing redundancy. Remember, while 3NF works well in most situations, more complex databases might benefit from additional forms like BCNF or 4NF.

If you found this article helpful, consider taking the next step by earning our SQL Associate Certification. It’s a great way to validate your SQL and database management skills and demonstrate your expertise to potential employers!

Associate Data Engineer in SQL

Gain practical knowledge in ETL, SQL, and data warehousing for data engineering.
Explore Track

Photo of Marie Fayard
Author
Marie Fayard

Senior Software Engineer, Technical Writer and Advisor with a background in physics. Committed to helping early-stage startups reach their potential and making complex concepts accessible to everyone.

Frequently Asked Questions About Third Normal Form

Can 3NF be applied to all types of databases?

While 3NF is effective in relational databases, it may not always be necessary for NoSQL databases, which often prioritize flexibility and scalability over strict normalization. In some cases, a denormalized schema may be preferred for performance reasons, especially when querying large amounts of data quickly.

What are the drawbacks of strictly following 3NF?

Strict adherence to 3NF can sometimes lead to complex schemas with many tables, which may require multiple joins in queries. This can negatively affect performance, especially in large databases or systems with high transaction volumes. In such cases, alternative approaches like denormalization or using BCNF may be more practical.

Can 3NF be applied to already existing databases, or do I need to redesign them?

3NF can definitely be applied to existing databases, though it may require significant restructuring. This process, called database refactoring, involves decomposing tables to eliminate redundancies and dependencies. Depending on the size and complexity of the database, this might involve some planning and testing to ensure data integrity and system performance are maintained.

What tools or techniques can help automate the process of achieving 3NF?

There are several database design tools available, like MySQL Workbench, Oracle SQL Developer, and ER/Studio, that help visualize the database schema and identify normalization issues. Some of these tools can suggest or automate steps to achieve 3NF, though human oversight is still important to ensure data integrity and consistency.

What is the difference between a candidate key and a primary key?

A candidate key is a minimal set of attributes that can uniquely identify each row in a table. There can be multiple candidate keys in a table. A primary key, on the other hand, is the specific candidate key chosen by the database designer to uniquely identify rows. Only one primary key is allowed per table, and it cannot have NULL values.

Why do we need Boyce-Codd normal form (BCNF) if a table is already in third normal form (3NF)?

BCNF is stricter than 3NF and addresses cases where dependencies exist on candidate keys. While 3NF removes transitive dependencies, it may still allow redundancy if a functional dependency has a determinant that is not a superkey. BCNF eliminates this by ensuring all functional dependencies have a superkey on the left-hand side.

Can a table have more than one candidate key?

Yes, a table can have multiple candidate keys. Each candidate key is a unique and minimal set of attributes that can identify rows.

Topics

Learn with DataCamp

course

Understanding Data Engineering

2 hr
249.7K
Discover how data engineers lay the groundwork that makes data science possible. No coding involved!
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Normalization in SQL (1NF - 5NF): A Beginner’s Guide

Database normalization is an important process used to organize and structure relational databases. This process ensures that data is stored in a way that minimizes redundancy, simplifies querying, and improves data integrity.
Samuel Shaibu's photo

Samuel Shaibu

9 min

tutorial

What is Transitive Dependency?

A transitive dependency occurs when one attribute in a database indirectly relies on another through a third attribute, causing redundancy. Keep reading to learn how to identify and eliminate transitive dependencies in database design .
Allan Ouko's photo

Allan Ouko

5 min

tutorial

How to Use a SQL Alias to Simplify Your Queries

Explore how using a SQL alias simplifies both column and table names. Learn why using a SQL alias is key for improving readability and managing complex joins.
Allan Ouko's photo

Allan Ouko

9 min

tutorial

Cleaning Data in SQL

In this tutorial, you'll learn techniques on how to clean messy data in SQL, a must-have skill for any data scientist.
Sayak Paul's photo

Sayak Paul

10 min

code-along

Data Modeling in SQL

In this live training, you'll learn about data cleaning, shaping and loading techniques and learn about common database schemas for organizing tables for analysis.
Andy Alseth's photo

Andy Alseth

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.
Kelsey McNeillie's photo

Kelsey McNeillie

See MoreSee More