Course
Imagine a hospital database where patient allergies can't be left blank or a financial system where transaction amounts must be positive numbers. In these scenarios and countless others, we rely on integrity constraints to ensure our data remains accurate, consistent, and reliable.
In SQL, integrity constraints are rules that we impose on our database tables to maintain data quality. They help prevent errors, enforce business rules, and ensure our data reflects the real-world entities and relationships it represents.
In this article, we'll delve into the essential types of integrity constraints in SQL, providing clear explanations and practical examples to illustrate their use in a PostgreSQL database. While we'll use PostgreSQL syntax, the concepts easily translate to other SQL dialects.
If you want to learn more about SQL, check out this list of SQL courses.
What Are Integrity Constraints in SQL?
Consider a scenario where we have a table that stores user information for a web application. Certain data, like a user's age, might be optional because it doesn’t hinder users from accessing our application. However, having a password for each user is essential for login purposes. To address this, we would implement an integrity constraint on the password column of our user’s table to guarantee that every entry in this table includes a password.
In essence, integrity constraints are crucial for:
- Preventing missing data.
- Ensuring all data adheres to the expected types and value ranges.
- Maintaining proper links between data across different tables.
In this article, we'll explore the following essential integrity constraints in SQL:
PRIMARY KEY
: Uniquely identifies each record in a table.NOT NULL
: Ensures a column cannot contain NULL values.UNIQUE
: Ensures that all values in a column or group of columns are unique.DEFAULT
: Provides a default value for a column when none is specified.CHECK
: Ensures that all values in a column satisfy a specific condition.FOREIGN KEY
: Establishes relationships between tables by referencing a primary key in another table.
Study Case: University Database
Let's consider a relational database for a university. This database contains three tables: students
, courses
, and enrollments
.
Students table
The students
table holds information on all university students.
student_id
: The identifier of the studentfirst_name
: The first name of the student.last_name
: The last name of the student.email
: The email address of the student.major
: The student’s major.enrollment_year
: The year in which the student enrolled.
Courses table
The courses
table holds information on the courses available at the university.
course_id
: The course identifier.course_name
: The name of the course.department
: The course department.
Enrollments table
The enrollments
table stores information on which students are enrolled in which courses.
student_id
: The identifier of the student enrolled in the course.course_id
: The identifier of the course.year
: The enrollment year.grade
: The student's grade in this course.is_passing_grade
: A boolean indicating if the grade is passing.
Throughout this article, we’ll use this example database and show several ways to enforce data integrity. We’ll use PostgreSQL syntax in our queries. However, the concepts easily translate to other flavors of SQL.
PRIMARY KEY Constraint
The university aims to identify each student uniquely. Utilizing first and last names for this purpose isn't advisable due to the possibility of name duplication among students. Likewise, relying on email addresses is not ideal since students may change their emails.
The common solution for ensuring unique identification is to assign a unique identifier to each student, which we store in the student_id
column. We can apply a PRIMARY KEY
constraint to the student_id
column to guarantee that each student has a unique identifier.
This constraint is defined in the CREATE TABLE
command, following the specification of the column data type:
CREATE TABLE students (
student_id INT PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT,
major TEXT,
enrollment_year INT
);
The above query creates the student table with the six columns we mentioned above.
The PRIMARY KEY
constraint ensures that:
- Each student has a
student_id
. - Each
student_id
is unique.
PRIMARY KEY constraint on multiple columns
In certain scenarios, we need to utilize multiple columns to uniquely identify each row. Take, for instance, the enrollments
table. A student can enroll in multiple courses, resulting in several rows sharing the same student_id
. Similarly, a course can have multiple students enrolled, leading to numerous rows with the same course_id
.
Given that no single field can uniquely identify a row, each enrollment record is determined using a combination of student_id
, course_id
, and year
.
When multiple columns are involved, the PRIMARY KEY
constraint is specified at the end of the CREATE TABLE
command.
CREATE TABLE enrollments (
student_id INT,
course_id INT,
year INT,
grade INT,
is_passing_grade BOOLEAN,
PRIMARY KEY (student_id, course_id, year)
);
Integrity constraints after table creation
There are two ways to add integrity constraints. We just learned how to do it when the table is created.
Imagine the table already exists, but you forgot to specify the PRIMARY KEY
constraint. You can also define it after the table is created using the ALTER TABLE
command, like so:
ALTER TABLE enrollments
ADD CONSTRAINT enroll_pk
PRIMARY KEY (student_id, course_id, year);
In the ALTER TABLE
query, we named the constraint enroll_pk
(standing for enrollment primary key). This name can be any identifier of your choosing, but it’s recommended to select a name that succinctly conveys the purpose of the constraint.
It’s best practice to name integrity constraints as it provides several benefits:
- It allows for easier reference, especially when you need to modify or drop the constraint in the future.
- It supports the management and organization of constraints, particularly in databases with a large number of constraints.
NOT NULL Constraint
The university wants to ensure that every student's name and email are recorded in the database. They don’t want staff to forget to input any of those fields accidentally.
To overcome this, we can use NOT NULL
constraints on these three columns when creating the table:
CREATE TABLE students (
student_id INT,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT NOT NULL,
major TEXT,
enrollment_year INT
);
The above query uses the NOT NULL
constraints to ensure that the columns first_name
, last_name
, and email
cannot have NULL
(undefined) values.
Adding a NOT NULL
constraint to an existing table can be done using the ALTER TABLE
command. The syntax for adding NOT NULL
constraints is as follows:
ALTER TABLE students
ALTER COLUMN first_name SET NOT NULL,
ALTER COLUMN last_name SET NOT NULL,
ALTER COLUMN email SET NOT NULL;
UNIQUE Constraint
Email addresses are inherently unique to each individual. In situations where we’re sure that a field should never have duplicate values, it’s good practice to enforce this at a database level. This helps prevent mistakes and ensure data integrity.
Adding this constraint is done in the same way as the others.
CREATE TABLE students (
...
email TEXT UNIQUE,
...
);
However, in our case, we also want to enforce the NOT NULL
constraint. We can enforce multiple constraints on a single column by space separating them:
CREATE TABLE students (
...
email TEXT UNIQUE NOT NULL,
...
);
Note that the order doesn’t matter, we could also have used NOT NULL UNIQUE
.
Adding a UNIQUE
constraint to an existing table is done as follows:
ALTER TABLE students
ADD CONSTRAINT unique_emails UNIQUE (email);
UNIQUE constraint on multiple columns
Suppose we want to ensure that each department's course name is unique in the courses
table. In this case, the course_name
and department
columns together should be unique.
When multiple columns are involved, the constraint is added at the end of the CREATE TABLE
command:
CREATE TABLE courses (
course_id INT,
course_name TEXT,
department TEXT,
UNIQUE (course_name, department)
);
Alternatively, we can add the constraint by altering an existing table. In this case, we provide a tuple with the names of the columns:
ALTER TABLE courses
ADD CONSTRAINT unique_course_name_department
UNIQUE (course_name, department);
NOT NULL UNIQUE vs PRIMARY KEY constraints
We’ve learned that a PRIMARY KEY
constraint enforces both unicity and that there are no missing values. So you might be wondering what’s the difference between:
course_id INT PRIMARY KEY
course_id INT UNIQUE NOT NULL
The difference between NOT NULL UNIQUE
and PRIMARY KEY
on a table is their intended purpose and usage.
While both enforce uniqueness and non-nullability on a column(s), a table can only have one PRIMARY KEY
, which is intended to uniquely identify each record in the table.
On the other hand, the combination NOT NULL UNIQUE
constraint can be applied to additional columns to enforce a unique value across each row, without null values, serving to maintain data integrity for specific business rules. A table can have any number of NOT NULL UNIQUE
constraints.
The existence of both allows for greater flexibility in database design, enabling multiple ways to enforce uniqueness and data integrity while distinguishing between the primary identifier of a record and other important, unique attributes within a table.
DEFAULT Constraint
Students may need some time after university registration to select their major. The university would like the value of the major column to be the string ’Undeclared’ for students who haven’t selected their major yet.
To do so, we can set a default value of this column using the DEFAULT
constraint. We can alter the students table like so:
ALTER TABLE students
ALTER COLUMN major SET DEFAULT 'Undeclared';
If, instead, we want to set a DEFAULT
constraint when the table is created, we can do it by declaring it after the column data type:
CREATE TABLE students (
...
major TEXT DEFAULT 'Undeclared',
...
);
CHECK Constraint
In this particular university, grades go from 0 to 100. Without any constraint, the grade
column from the enrollment
table accepts any integer value. We can fix that by using a CHECK
constraint to enforce the values to be between 0 and 100.
ALTER TABLE enrollments
ADD CONSTRAINT grade_range CHECK (grade BETWEEN 0 AND 100);
In general, CHECK
constraints enable us to validate specific conditions we want the data to satisfy. These are important to ensure data consistency and integrity.
A CHECK
constraint can involve more than one column. Let’s use it to ensure that the grade
and is_passing_grade
have consistent values. Say a grade is passing if its value is at least 60. Then we can ensure that is_passing_grade
is TRUE
if and only if the grade is at least 60. Let’s do it in the table creation to show how CHECK
constraints are declared in the CREATE TABLE
command:
CREATE TABLE enrollments (
...
grade INT,
is_passing_grade BOOLEAN,
CONSTRAINT grade_check CHECK (grade BETWEEN 0 AND 100),
CONSTRAINT is_passing_grade CHECK (
(grade >= 60 AND is_passing_grade = TRUE) OR
(grade < 60 AND is_passing_grade = FALSE)
)
);
There is a problem with the above constraint. When a student enrolls in a course, they won’t have a grade yet. So we should allow NULL
values on the grade. When doing so, we also need to update the passing grade constraint to be NULL
when the grade is still undefined. Here’s how to update the constraint to take these into account:
CREATE TABLE enrollments (
...
grade INT NULL DEFAULT NULL,
is_passing_grade BOOLEAN NULL DEFAULT NULL,
CONSTRAINT grade_check CHECK (grade BETWEEN 0 AND 100),
CONSTRAINT is_passing_grade CHECK (
(grade IS NULL AND is_passing_grade IS NULL) OR
(grade >= 60 AND is_passing_grade = TRUE) OR
(grade < 60 AND is_passing_grade = FALSE)
),
...
);
Note that on both the grade
and is_passing_grade
columns we added NULL
on the data type and as the default value. The purpose of these is simply to increase readability.
Let’s now see what conditions we can enforce with a CHECK
constraint.
Range conditions
We can ensure that values in a column fall within a specific range.
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (column_name BETWEEN min_value AND max_value);
List conditions
We can validate that a column's value matches one in a list of specific values.
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (column_name IN ('Value1', 'Value2', 'Value3'));
Comparison conditions
We can compare values in a column to ensure they meet a specified condition (greater than, less than, etc.).
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (column_name > some_value);
Pattern matching conditions
We can use pattern matching (e.g., with LIKE
or SIMILAR TO
) to validate text data.
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (column_name LIKE 'pattern');
Logical conditions
We can allow multiple conditions using logical operators (AND
, OR
, NOT
).
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (condition1 AND condition2 OR condition3);
Composite conditions
Applies a condition over multiple columns.
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
CHECK (column1 + column2 < some_value);
FOREIGN KEY Constraint
Foreign key constraints are used to link the columns of two tables together, ensuring the referential integrity of the data. In essence, a foreign key in one table points to a primary key in another table, indicating that the rows in these two tables are related. This ensures that you can't have a row in a table with a foreign key that doesn't correspond to any row in the linked table with the primary key.
In our example, each record in the enrollments table refers to a student and a course through the student_id
and course_id
columns, respectively. Without any constraint, there’s nothing to ensure that the values of these identifiers in the enrollments table match existing entries in the students and courses tables.
Here’s how we ensure this when creating the enrollments
table:
CREATE TABLE enrollments (
student_id INT,
course_id INT,
...
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
Note that for these to be foreign keys, they must be primary keys in the students
and courses
tables, respectively.
Just as with other constraints, we could also declare these after table creation using the ALTER TABLE
command:
ALTER TABLE enrollments
ADD CONSTRAINT fk_student_id
FOREIGN KEY (student_id)
REFERENCES students(student_id);
ALTER TABLE enrollments
ADD CONSTRAINT fk_course_id
FOREIGN KEY (course_id)
REFERENCES courses(course_id);
Putting It All Together
Throughout this article, we showcased several integrity constraints and how to use them to improve a university database. Here’s the final version of the CREATE TABLE
command, which combines everything we’ve learned.
First, we'll define the students
table:
CREATE TABLE students (
student_id INT PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
major TEXT DEFAULT 'Undeclared',
enrollment_year INT,
CONSTRAINT year_check CHECK (enrollment_year >= 1900),
CHECK (major IN (
'Undeclared',
'Computer Science',
'Mathematics',
'Biology',
'Physics',
'Chemistry',
'Biochemistry'
))
);
Next, let's define the courses
table:
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name TEXT NOT NULL,
department TEXT NOT NULL,
UNIQUE (course_name, department),
CHECK (department IN (
'Physics & Mathematics',
'Sciences'
))
);
Finally, we'll define the enrollments
table, establishing the relationships between students and courses:
CREATE TABLE enrollments (
student_id INT,
course_id INT,
year INT CHECK (year >= 1900),
grade INT NULL DEFAULT NULL,
is_passing_grade BOOLEAN NULL DEFAULT NULL,
CONSTRAINT grade_check CHECK (grade BETWEEN 0 AND 100),
CONSTRAINT is_passing_grade CHECK (
(grade IS NULL AND is_passing_grade IS NULL) OR
(grade >= 60 AND is_passing_grade = TRUE) OR
(grade < 60 AND is_passing_grade = FALSE)
),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id),
PRIMARY KEY (student_id, course_id, year)
);
We’ve added a few constraints in this final example. A university has a known set of departments and majors. Therefore, the set of values that the major
and department
columns can have is finite and known in advance. In these situations, using a CHECK
constraint is recommended to ensure that the columns can only take values from that known set of values.
Conclusion
In this article, we explored the different types of integrity constraints in SQL and how to implement them using PostgreSQL. We covered primary keys, NOT NULL
constraints, UNIQUE
constraints, DEFAULT
constraints, CHECK
constraints, and FOREIGN KEY
constraints, providing practical examples for each.
By understanding these concepts, we can ensure the accuracy, consistency, and reliability of our data.
If you want to learn more about organizing data efficiently, check out this course on Database Design.
Learn SQL with these courses!
Course
Joining Data in SQL
Course
Data-Driven Decision Making in SQL
tutorial
SQL Triggers: A Guide for Developers
Oluseye Jeremiah
13 min
tutorial
SQL Query Examples and Tutorial
tutorial
SQL Commands for Data Scientists
Emiko Sano
12 min
tutorial
Set Operators in SQL: A Comprehensive Guide
tutorial
Joins in SQL Tutorial
DataCamp Team
5 min
tutorial
SQL OR Operator: A Guide with Examples
Allan Ouko
8 min