Course notes: Creating PostgreSQL Databases

add text here

In the SQL cell below:

Set the source to 'Course Databases' to query databases used in SQL courses.
Set the source to 'DataFrames and CSVs' to query this course's CSV files (if any) with SQL
Write SQL! Note that you cannot run queries that modify the database.

Concepts about Database and Schema in PostgreSQL

Database

Definition: A database in PostgreSQL is a collection of schemas, which in turn contain tables, views, functions, and other objects.
Creation: Databases are created using the CREATE DATABASE command.
Connection: To interact with a database, you must connect to it using a client or application.
Isolation: Each database is isolated from others, meaning objects in one database are not directly accessible from another.

Schema

Definition: A schema is a logical container within a database that holds tables, views, indexes, sequences, data types, functions, and other objects.
Namespace: Schemas provide a namespace to organize database objects, allowing the same object name to be used in different schemas.
Default Schema: PostgreSQL includes a default schema named public. Unless specified otherwise, new objects are created in this schema.
Creation: Schemas are created using the CREATE SCHEMA command.
Usage: Objects within a schema are referenced using the schema_name.object_name notation.

Example Commands

Creating a Database

CREATE DATABASE my_database;

CREATE SCHEMA new_schema;

Important Data Types in PostgreSQL

PostgreSQL supports a wide range of data types to accommodate various kinds of data. Here are some of the most commonly used data types:

Numeric Types

smallint: 2-byte integer, range -32,768 to 32,767.
integer: 4-byte integer, range -2,147,483,648 to 2,147,483,647.
bigint: 8-byte integer, range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
decimal: Variable precision numeric type.
numeric: Similar to decimal, used for exact numeric values.
real: 4-byte floating point number.
double precision: 8-byte floating point number.
serial: Auto-incrementing 4-byte integer.
bigserial: Auto-incrementing 8-byte integer.

Character Types

char(n): Fixed-length character type.
varchar(n): Variable-length character type with a limit.
text: Variable-length character type with no specific limit.

Date/Time Types

timestamp: Date and time (without time zone).
timestamptz: Date and time (with time zone).
date: Calendar date (year, month, day).
time: Time of day (without time zone).
timetz: Time of day (with time zone).
interval: Time span.

Boolean Type

boolean: Logical Boolean (true/false).

UUID Type

uuid: Universally unique identifier.

Array Type

array: Variable-length array of any data type.

JSON Types

json: Textual JSON data.
jsonb: Binary JSON data, more efficient for processing.

Special Types

bytea: Binary data ("byte array").
inet: IPv4 or IPv6 host address.
cidr: IPv4 or IPv6 network.
macaddr: MAC address.

Example Commands

Creating a Table with Various Data Types

CREATE TABLE example_table (
    id serial PRIMARY KEY,
    name varchar(100),
    age integer,
    salary numeric(10, 2),
    created_at timestamptz DEFAULT current_timestamp,
    is_active boolean,
    data jsonb
);

Categorical Data in PostgreSQL

In PostgreSQL, categorical data can be represented using various data types such as ENUM, TEXT, or VARCHAR. Here are some examples of how to use these data types to handle categorical data:

Using ENUM

The ENUM type is useful for representing a fixed set of values. For example, you can define an ENUM type for days of the week:

CREATE TYPE day_of_week AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');

CREATE TABLE schedule (
    id SERIAL PRIMARY KEY,
    day day_of_week NOT NULL
);

Using Boolean Data

In PostgreSQL, Boolean data is represented using the BOOLEAN data type. This type can store TRUE, FALSE, and NULL values. Here is an example of how to use the BOOLEAN data type in a table:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    is_active BOOLEAN NOT NULL DEFAULT TRUE
);

-- Insert some example data
INSERT INTO users (username, is_active) VALUES ('alice', TRUE);
INSERT INTO users (username, is_active) VALUES ('bob', FALSE);
INSERT INTO users (username) VALUES ('charlie');  -- Defaults to TRUE

-- Query the table
SELECT * FROM users;

Temporal Data Types in PostgreSQL

PostgreSQL provides several data types to handle temporal data, such as dates, times, and timestamps. Here are some examples of how to use these data types:

DATE

The DATE type stores the calendar date (year, month, day) without any time information.

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    event_name VARCHAR(100) NOT NULL,
    event_date DATE NOT NULL
);

-- Insert some example data
INSERT INTO events (event_name, event_date) VALUES ('Conference', '2023-10-15');
INSERT INTO events (event_name, event_date) VALUES ('Meeting', '2023-11-01');

-- Query the table
SELECT * FROM events;

The Importance of Data Normalization

Data normalization is a crucial process in database design that involves organizing the columns (attributes) and tables (relations) of a database to reduce data redundancy and improve data integrity. Here are some key reasons why data normalization is important:

1. Reduces Data Redundancy

Normalization eliminates duplicate data by ensuring that each piece of information is stored only once. This reduces the amount of storage space required and minimizes the risk of data inconsistencies.

2. Improves Data Integrity

By organizing data into related tables and enforcing relationships through foreign keys, normalization ensures that data remains accurate and consistent. This helps maintain the integrity of the database.

3. Simplifies Data Maintenance

Normalized databases are easier to maintain because updates, deletions, and insertions are performed in a single place. This reduces the complexity of database operations and minimizes the risk of anomalies.

4. Enhances Query Performance

While normalization can sometimes lead to more complex queries, it often results in more efficient data retrieval. Properly indexed normalized tables can improve query performance by reducing the amount of data that needs to be scanned.

5. Facilitates Scalability

A well-normalized database design can handle growth more effectively. As the database scales, normalized structures help ensure that performance remains stable and that the database can accommodate increasing amounts of data.

Normal Forms

Normalization is typically achieved through a series of steps known as normal forms. The most commonly used normal forms are:

First Normal Form (1NF): Ensures that the table has a primary key and that all columns contain atomic (indivisible) values.
Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key.
Third Normal Form (3NF): Builds on 2NF by ensuring that all non-key attributes are not only fully functionally dependent on the primary key but also independent of each other.

Example

Consider a table that stores information about students and their courses:

StudentID	StudentName	CourseID	CourseName
1	Alice	101	Math
2	Bob	102	Science
1	Alice	103	History

This table is not normalized because it contains redundant data (e.g., Alice's name is repeated). By normalizing it, we can create two separate tables:

Students Table:

StudentID	StudentName
1	Alice
2	Bob

Courses Table:

CourseID	CourseName
101	Math
102	Science
103	History

Enrollments Table:

StudentID	CourseID
1	101
2	102
1	103

By normalizing the data, we have eliminated redundancy and improved the structure of the database.

In summary, data normalization is essential for creating efficient, reliable, and scalable databases. It helps reduce redundancy, improve data integrity, simplify maintenance, enhance query performance, and facilitate scalability.

Creating a Table in 1NF (First Normal Form)

First Normal Form (1NF) is the basic level of database normalization. It requires that the values in each column of a table are atomic (indivisible). This means that each column should contain only one value per row, and each row should be unique.

Steps to Create a Table in 1NF

Identify the Entities and Attributes: Determine the entities (tables) and their attributes (columns) that need to be stored in the database.
Ensure Atomicity: Make sure that each column contains only atomic values. This means that each cell in the table should hold a single value, not a set of values or a list.
Eliminate Repeating Groups: Ensure that there are no repeating groups of columns. Each attribute should be dependent on the primary key.
Define the Primary Key: Choose a primary key for the table that uniquely identifies each row.

Example

Let's consider an example where we have a table that stores information about students and their courses. Initially, the table might look like this:

StudentID	StudentName	Courses
1	Alice	Math, Science
2	Bob	English, History
3	Charlie	Math, English

Step 1: Identify the Entities and Attributes

Entities: Students Attributes: StudentID, StudentName, Courses

Step 2: Ensure Atomicity

The Courses column contains multiple values, which violates 1NF. We need to split these values into separate rows.

Step 3: Eliminate Repeating Groups

We will create a new table to store the courses for each student.

Step 4: Define the Primary Key

We will use StudentID as the primary key for the Students table and a composite key (StudentID, Course) for the StudentCourses table.

Final Tables in 1NF

Students Table:

StudentID	StudentName
1	Alice
2	Bob
3	Charlie

StudentCourses Table:

StudentID	Course
1	Math
1	Science
2	English
2	History
3	Math
3	English

By following these steps, we have ensured that our tables are in First Normal Form (1NF). Each column contains atomic values, and there are no repeating groups.

Creating a Table in 2NF (Second Normal Form)

To create a table in 2NF, we need to ensure that the table is already in 1NF (First Normal Form) and that it meets the criteria for 2NF. Here are the steps to achieve this:

Step 1: Ensure the Table is in 1NF

1NF requires that:

The table has a primary key.
All columns contain atomic (indivisible) values.
Each column contains values of a single type.
Each column contains unique values or allows NULLs.

Example of a table in 1NF:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    Quantity INT,
    Price DECIMAL(10, 2)
);

Creating a Table in 3NF (Third Normal Form)

To create a table in 3NF, we need to ensure that the table is already in 2NF (Second Normal Form) and that it meets the criteria for 3NF. Here are the steps to achieve this:

Step 1: Ensure the Table is in 2NF

2NF requires that:

The table is in 1NF.
All non-key attributes are fully functional dependent on the primary key.

Example of a table in 2NF:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE
);

CREATE TABLE OrderDetails (
    OrderDetailID INT PRIMARY KEY,
    OrderID INT,
    ProductID INT,
    Quantity INT,
    Price DECIMAL(10, 2),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

Roles and Privileges in PostgreSQL

In PostgreSQL, roles are used to manage database access permissions. Roles can represent a single user or a group of users. Privileges are the permissions granted to roles to perform certain actions on database objects.

Main Concepts

Roles:
- User Roles: Represent individual users. Created using the CREATE ROLE or CREATE USER command.
- Group Roles: Represent groups of users. Can be used to manage permissions for multiple users at once.
Example:
```
-- Create a user role
CREATE ROLE alice WITH LOGIN PASSWORD 'password';

-- Create a group role
CREATE ROLE developers;
```

-- Alter the role to set a new password for Alice.

ALTER ROLE alice WITH PASSWORD 'new_secure_password';

-- Create a user role for John
CREATE ROLE john WITH LOGIN PASSWORD 'john_password';

-- Grant SELECT, INSERT, and UPDATE privileges on the 'students' table to John
GRANT SELECT, INSERT, UPDATE ON students TO john;

‌
‌
‌

Course notes: Creating PostgreSQL Databases

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Concepts about Database and Schema in PostgreSQL

Database

Schema

Example Commands

Creating a Database

Important Data Types in PostgreSQL

Numeric Types

Character Types

Date/Time Types

Boolean Type

UUID Type

Array Type

JSON Types

Special Types

Example Commands

Creating a Table with Various Data Types

Categorical Data in PostgreSQL

Using ENUM

Using Boolean Data

Temporal Data Types in PostgreSQL

DATE

The Importance of Data Normalization

1. Reduces Data Redundancy

2. Improves Data Integrity

3. Simplifies Data Maintenance

4. Enhances Query Performance

5. Facilitates Scalability

Normal Forms

Example

Creating a Table in 1NF (First Normal Form)

Steps to Create a Table in 1NF

Example

Step 1: Identify the Entities and Attributes

Step 2: Ensure Atomicity

Step 3: Eliminate Repeating Groups

Step 4: Define the Primary Key

Final Tables in 1NF

Creating a Table in 2NF (Second Normal Form)

Step 1: Ensure the Table is in 1NF

Creating a Table in 3NF (Third Normal Form)

Step 1: Ensure the Table is in 2NF

Roles and Privileges in PostgreSQL

Main Concepts

-- Alter the role to set a new password for Alice.

Concepts about Database and Schema in PostgreSQL