Skip to main content

What is Second Normal Form (2NF)?

Learn how second normal form (2NF) can improve your database design by eliminating partial dependencies.
Jan 6, 2025  · 9 min read

When working with databases, it's common to encounter issues like redundant data and inconsistent updates. Second normal form is a database normalization step that builds on first normal form (1NF) to create cleaner and more efficient tables. 

Understanding 2NF is critical for anyone working in database design or data management, and it lays the foundation for higher normalization forms like third normal form (3NF). In this article, we’ll explore how 2NF works and how to transform tables to meet 2NF requirements, with practical examples. We’ll also talk about the benefits and drawbacks of 2NF, and the use cases it suits best.

Understanding Second Normal Form

Second normal form is a database normalization step focused on eliminating partial dependencies. It was introduced by Edgar F. Codd, the pioneer of relational databases, as part of his work on normalization.  

Before a table can be in 2NF, it must satisfy the rules of first normal form:

  • Atomicity: Each cell must contain a single value (no repeating groups or arrays).
  • Unique rows: The table must have a clear primary key.

2NF goes one step further with an additional rule: eliminate partial dependencies.

A partial dependency occurs when a non-prime attribute (column that isn't part of any candidate key) relies on only part of a composite key instead of the whole key. The 2NF rule ensures that all non-prime attributes are dependent on the entire primary key, not just a part of it. Leaving partial dependencies in a table means that redundant data can creep into the database, leading to inefficiency and potential inconsistencies during updates or deletions.

The theory alone can be a little dry, so let’s look at a practical example.

Below is a Course Enrollment table of Datacamp students.

Student ID Course ID Course Name Instructor Name
1001 201 SQL Fundamentals Ken Smith
1002 202 Introduction to Python Merlin O’Donnell
1001 202 Introduction to Python Merlin O’Donnell

Here, the primary key is the composite of Student ID and Course ID. However, the non-prime attributes Course Name and Course Fee depend only on Course ID, not the entire key. This violates 2NF.

Steps for Decomposing Tables to Achieve 2NF

To make sure that a table follows the rules of 2NF, you need to:

  1. Identify All Candidate Keys: Determine the minimal sets of attributes that uniquely identify rows in the table. These are your candidate keys.
  2. Determine Functional Dependencies: Identify all functional dependencies in the table. Specifically, look for dependencies where non-prime attributes (those not part of any candidate key) depend only on a part of a composite key.
  3. Eliminate Partial Dependencies: For each partial dependency:
    • Move the dependent attributes into a new table along with the part of the key they depend on.
    • Ensure the new table has a unique primary key.
  4. Repeat Until No Partial Dependencies Remain: Confirm that every non-prime attribute in all tables is fully dependent on its respective primary key.

Examples of Second Normal Form in Practice

Let's now look at two examples.

Example 1: Course enrollment table

Earlier, we saw the following course enrollment table:

Student ID Course ID Course Name Instructor Name
1001 201 SQL Fundamentals Ken Smith
1002 202 Introduction to Python Merlin O’Donnell
1001 202 Introduction to Python Merlin O’Donnell

Let’s follow the steps we outlined in the previous section.

1. Identify our candidate key.

In this case, the candidate key is a composite key of Student ID and Course ID. This unique combination identifies each row in the table.

2. Determine our functional dependencies

Course Name and Instructor Name depend on Course ID, not the full composite key (Student ID, Course ID). This is a partial dependency because these attributes depend on only part of the composite key.

3. Eliminate partial dependencies

We need to move the attributes that depend on only part of the key (Course Name and Instructor Name) to a new table that is based solely on Course ID.

After decomposition, our new tables look like this:

Course enrollment table

Student ID Course ID
1001 201
1002 202
1001 202

Course details table

Course ID Course Name Instructor Name
201 SQL Fundamentals Ken Smith
202 Introduction to Python Merlin O’Donnell

If you want to get hands-on and create your own databases, have a look at our PostgresQL course. If you are a little more advanced, you could try this Introduction to Data Modeling in Snowflake, which covers ideas like entity-relationship and dimensional modeling.

Example 2: Orders table

We will start with this Orders table. Try to follow the steps we outlined above and decompose this table yourself! 

Order ID Product ID Order Date Product Name Supplier Name
1 201 2024-11-01 Laptop TechSupply
1 202 2024-11-01 Mouse TechSupply
2 201 2024-11-02 Laptop TechSupply
3 203 2024-11-03 Keyboard KeyMasters

1. Identify our candidate key

The Order ID and Product ID combination uniquely identifies each row, making (Order ID, Product ID) a composite candidate key. No single column can uniquely identify rows because:

  • Order ID alone isn't unique, as multiple products can be part of the same order.
  • Product ID alone isn't unique, as the same product can appear in different orders.

This means that (Order ID, Product ID) is also our primary key.

2. Determine our functional dependencies

Order Date depends on Order ID (not on the full composite key). This is a partial dependency.

Product Name and Supplier Name depend on Product ID (not on the full composite key). These are also partial dependencies.

3. Eliminate partial dependencies

We need to split the table into smaller tables, each addressing one logical dependency. 

First, we’ll create a table for order information, which contains information specific to Order ID.

Orders Table

Order ID Order Date
1 2024-11-01
2 2024-11-02
3 2024-11-03

Then, we create a table that contains information specific to Product ID.

Orders Table

Product ID Product Name Supplier Name
201 Laptop TechSupply
202 Mouse TechSupply
203 Keyboard KeyMasters

The original table is now reduced to just the composite key and the relationships between orders and products.

Order ID Product ID
1 201
1 202
2 201
3 203

Now, our database is in 2NF because 1) all partial dependencies have been eliminated, and 2) non-prime attributes depend entirely on their respective primary keys.

When to Implement Second Normal Form

So, why should you refactor your database to 2NF? Is it sufficient on its own or should you take one step further and aim for 3NF? 

Benefits and limitations of second normal form

Second normal form offers several advantages, making it a useful step in the database normalization process:

  • Enhanced data integrity: By eliminating partial dependencies, 2NF minimizes insertion, update, and deletion anomalies, leading to a more reliable database.
  • Reduction of redundancy: 2NF decreases data repetition, optimizing storage usage and simplifying data maintenance.
  • Improved data structure: It lays the groundwork for further normalization, like progression to third normal form, by creating a cleaner and more efficient database design.

But it does come with some limitations:

  • Increased complexity: Decomposing tables to meet 2NF can make the design process more complex, particularly when dealing with composite keys and dependencies.
  • Additional joins: Splitting tables may require more joins in queries, potentially impacting performance in systems with large datasets or complex queries - more on that below.
  • Residual redundancy: While 2NF reduces partial dependencies, it does not address transitive dependencies, leaving some redundancy until addressed in 3NF.

Performance considerations with second normal form

Decomposing tables to eliminate partial dependencies can directly impact database performance. On one hand, achieving 2NF reduces data redundancy and improves consistency, leading to fewer anomalies during insert, update, or delete operations. On the other hand, normalization can increase the number of tables, which means additional joins are necessary when retrieving related data. This could impact query performance in large datasets.

To make sure your normalized database remains performant, make sure you follow these best practices:

  • Indexing: Use indexes to speed up joins between decomposed tables.
  • Query optimization: Optimize queries to minimize the cost of additional joins.
  • Hybrid approach: Combine normalization with denormalization in areas where performance is important, such as reporting tables.
  • Regular monitoring: Continuously evaluate your database performance with profiling tools to catch any potential issue.

Is 2NF just a transitional step to achieve third normal form?

In most cases, database designers strive to achieve third normal form due to its ability to reduce redundancy further and improve overall data integrity. However, achieving 3NF often involves additional work, such as creating more tables and relationships, which can introduce complexity and performance trade-offs in query execution.

There are cases where using second normal form by itself can be sufficient. If simplicity and quick implementation are priorities, such as in small-scale projects, prototyping, or situations where data redundancy is minimal, 2NF can suffice. For example, in systems where all attributes are already fully dependent on a simple primary key, achieving 2NF might fulfill the primary goal of reducing partial dependency, without the need for further normalization.

Moving beyond second normal form: toward third normal form

If you want to normalize your database further, you can keep refactoring your tables to reach third normal form.

3NF builds on 2NF by addressing transitive dependencies – situations where non-key attributes depend on other non-key attributes rather than the primary key. This progression ensures that each attribute is directly dependent on the primary key and nothing else.

For example, in a table tracking course enrollments:

  • 2NF: Ensures that attributes like the course name and student name depend entirely on their respective primary keys (e.g., Student ID and Course ID). This eliminates partial dependencies, where non-key attributes rely only on part of the composite key.
  • 3NF: Ensures that attributes like instructor details or department information are stored in separate tables, eliminating transitive dependencies.

3NF is ideal for more complex systems where data integrity and efficiency are paramount, especially as the volume of data grows. Check out our What is third normal form? article if you want to learn more about 3NF and its more restrictive form, BCNF.

Conclusion

Second normal form is an essential step in database normalization, bridging the gap between 1NF and higher forms like 3NF. By removing partial dependencies, 2NF reduces redundancy and improves the reliability of your data. While it can add some complexity, the benefits of improved data integrity and simplified maintenance make it a critical part of effective database design.

If you're ready to take your skills further, explore our Database Design course to deepen your understanding of normalization techniques and their practical applications. You can also validate your SQL and database management skills and demonstrate your expertise to potential employers with our SQL Associate Certification!

Lastly, I want to say, if you are a decisionmaker in a business and you know that you have work to do in creating cleaner, more efficient databases, consider putting in a DataCamp for Business demo request. We can help transform your team's capabilities so that you can create scalable database systems that drive business efficiency and innovation. We can even create tailored learning paths and custom tracks. 

Training 2 or more people? Check out our Business solutions

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

Try DataCamp for Business
business-homepage-hero.png

Marie Fayard's photo
Author
Marie Fayard

Senior Software Engineer, Technical Writer and Advisor with a background in physics. Committed to helping early-stage startups reach their potential and making complex concepts accessible to everyone.

2NF FAQs

Can second normal form (2NF) be applied to databases without composite keys?

Yes! While 2NF often involves composite keys, single-key tables can also follow 2NF principles if all non-prime attributes fully depend on the primary key.

Does achieving 2NF guarantee the elimination of all redundancy in a database?

No. While 2NF eliminates partial dependencies, it doesn’t address transitive dependencies. For that, you need to progress to third normal form (3NF).

Are there situations where a database in 1NF is sufficient without progressing to 2NF?

Yes. In very simple systems with minimal data redundancy or in prototypes, 1NF might suffice. I would generally not recommend stopping at 1NF unless you have a very specific and justifiable reason.

What’s the difference between 2NF and third normal form (3NF)?

2NF removes partial dependencies, ensuring attributes depend on the entire key. 3NF goes further by removing transitive dependencies, ensuring that attributes depend directly on the primary key and not on other non-key attributes.

How do I check if my existing database is already in 2NF?

You can check by reviewing all tables with composite keys and verifying that all non-prime attributes depend on the entire composite key, not just a part of it.

Topics

Learn with DataCamp

course

Database Design

4 hr
82.2K
Learn to design databases in SQL to process, store, and organize data in a more efficient way.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

What is Third Normal Form (3NF)?

Learn how third normal form (3NF) can help you organize your databases more efficiently by removing redundancy and minimizing dependency issues. See how decomposing tables can simplify data management.
Marie Fayard's photo

Marie Fayard

9 min

tutorial

Normalization in SQL (1NF - 5NF): A Beginner’s Guide

Database normalization is an important process used to organize and structure relational databases. This process ensures that data is stored in a way that minimizes redundancy, simplifies querying, and improves data integrity.
Samuel Shaibu's photo

Samuel Shaibu

9 min

tutorial

SQL Tutorial: How To Write Better Queries

Learn about anti-patterns, execution plans, time complexity, query tuning, and optimization in SQL.
Karlijn Willems's photo

Karlijn Willems

35 min

tutorial

SQL Order of Execution: Understanding How Queries Run

Understand the SQL order of execution and how its different from the order of writing. Write accurate and optimized queries for improved performance and avoid common mistakes in query design.
Allan Ouko's photo

Allan Ouko

5 min

code-along

Data Modeling in SQL

In this live training, you'll learn about data cleaning, shaping and loading techniques and learn about common database schemas for organizing tables for analysis.
Andy Alseth's photo

Andy Alseth

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.
Kelsey McNeillie's photo

Kelsey McNeillie

See MoreSee More