Skip to content

add text here

In the SQL cell below:

  • Set the source to 'Course Databases' to query databases used in SQL courses.
  • Set the source to 'DataFrames and CSVs' to query this course's CSV files (if any) with SQL
  • Write SQL! Note that you cannot run queries that modify the database.

Concepts about Database and Schema in PostgreSQL

Database

  • Definition: A database in PostgreSQL is a collection of schemas, which in turn contain tables, views, functions, and other objects.
  • Creation: Databases are created using the CREATE DATABASE command.
  • Connection: To interact with a database, you must connect to it using a client or application.
  • Isolation: Each database is isolated from others, meaning objects in one database are not directly accessible from another.

Schema

  • Definition: A schema is a logical container within a database that holds tables, views, indexes, sequences, data types, functions, and other objects.
  • Namespace: Schemas provide a namespace to organize database objects, allowing the same object name to be used in different schemas.
  • Default Schema: PostgreSQL includes a default schema named public. Unless specified otherwise, new objects are created in this schema.
  • Creation: Schemas are created using the CREATE SCHEMA command.
  • Usage: Objects within a schema are referenced using the schema_name.object_name notation.

Example Commands

Creating a Database
CREATE DATABASE my_database;

CREATE SCHEMA new_schema;

Important Data Types in PostgreSQL

PostgreSQL supports a wide range of data types to accommodate various kinds of data. Here are some of the most commonly used data types:

Numeric Types

  • smallint: 2-byte integer, range -32,768 to 32,767.
  • integer: 4-byte integer, range -2,147,483,648 to 2,147,483,647.
  • bigint: 8-byte integer, range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
  • decimal: Variable precision numeric type.
  • numeric: Similar to decimal, used for exact numeric values.
  • real: 4-byte floating point number.
  • double precision: 8-byte floating point number.
  • serial: Auto-incrementing 4-byte integer.
  • bigserial: Auto-incrementing 8-byte integer.

Character Types

  • char(n): Fixed-length character type.
  • varchar(n): Variable-length character type with a limit.
  • text: Variable-length character type with no specific limit.

Date/Time Types

  • timestamp: Date and time (without time zone).
  • timestamptz: Date and time (with time zone).
  • date: Calendar date (year, month, day).
  • time: Time of day (without time zone).
  • timetz: Time of day (with time zone).
  • interval: Time span.

Boolean Type

  • boolean: Logical Boolean (true/false).

UUID Type

  • uuid: Universally unique identifier.

Array Type

  • array: Variable-length array of any data type.

JSON Types

  • json: Textual JSON data.
  • jsonb: Binary JSON data, more efficient for processing.

Special Types

  • bytea: Binary data ("byte array").
  • inet: IPv4 or IPv6 host address.
  • cidr: IPv4 or IPv6 network.
  • macaddr: MAC address.

Example Commands

Creating a Table with Various Data Types
CREATE TABLE example_table ( id serial PRIMARY KEY, name varchar(100), age integer, salary numeric(10, 2), created_at timestamptz DEFAULT current_timestamp, is_active boolean, data jsonb );

Categorical Data in PostgreSQL

In PostgreSQL, categorical data can be represented using various data types such as ENUM, TEXT, or VARCHAR. Here are some examples of how to use these data types to handle categorical data:

Using ENUM

The ENUM type is useful for representing a fixed set of values. For example, you can define an ENUM type for days of the week:

CREATE TYPE day_of_week AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'); CREATE TABLE schedule ( id SERIAL PRIMARY KEY, day day_of_week NOT NULL );

Using Boolean Data

In PostgreSQL, Boolean data is represented using the BOOLEAN data type. This type can store TRUE, FALSE, and NULL values. Here is an example of how to use the BOOLEAN data type in a table:

CREATE TABLE users ( id SERIAL PRIMARY KEY, username VARCHAR(50) NOT NULL, is_active BOOLEAN NOT NULL DEFAULT TRUE ); -- Insert some example data INSERT INTO users (username, is_active) VALUES ('alice', TRUE); INSERT INTO users (username, is_active) VALUES ('bob', FALSE); INSERT INTO users (username) VALUES ('charlie'); -- Defaults to TRUE -- Query the table SELECT * FROM users;

Temporal Data Types in PostgreSQL

PostgreSQL provides several data types to handle temporal data, such as dates, times, and timestamps. Here are some examples of how to use these data types:

DATE

The DATE type stores the calendar date (year, month, day) without any time information.

CREATE TABLE events ( id SERIAL PRIMARY KEY, event_name VARCHAR(100) NOT NULL, event_date DATE NOT NULL ); -- Insert some example data INSERT INTO events (event_name, event_date) VALUES ('Conference', '2023-10-15'); INSERT INTO events (event_name, event_date) VALUES ('Meeting', '2023-11-01'); -- Query the table SELECT * FROM events;

The Importance of Data Normalization

Data normalization is a crucial process in database design that involves organizing the columns (attributes) and tables (relations) of a database to reduce data redundancy and improve data integrity. Here are some key reasons why data normalization is important:

1. Reduces Data Redundancy

Normalization eliminates duplicate data by ensuring that each piece of information is stored only once. This reduces the amount of storage space required and minimizes the risk of data inconsistencies.

2. Improves Data Integrity

By organizing data into related tables and enforcing relationships through foreign keys, normalization ensures that data remains accurate and consistent. This helps maintain the integrity of the database.

3. Simplifies Data Maintenance

Normalized databases are easier to maintain because updates, deletions, and insertions are performed in a single place. This reduces the complexity of database operations and minimizes the risk of anomalies.

4. Enhances Query Performance

While normalization can sometimes lead to more complex queries, it often results in more efficient data retrieval. Properly indexed normalized tables can improve query performance by reducing the amount of data that needs to be scanned.

5. Facilitates Scalability

A well-normalized database design can handle growth more effectively. As the database scales, normalized structures help ensure that performance remains stable and that the database can accommodate increasing amounts of data.

Normal Forms

Normalization is typically achieved through a series of steps known as normal forms. The most commonly used normal forms are:

  • First Normal Form (1NF): Ensures that the table has a primary key and that all columns contain atomic (indivisible) values.
  • Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key.
  • Third Normal Form (3NF): Builds on 2NF by ensuring that all non-key attributes are not only fully functionally dependent on the primary key but also independent of each other.

Example

Consider a table that stores information about students and their courses:

StudentIDStudentNameCourseIDCourseName
1Alice101Math
2Bob102Science
1Alice103History

This table is not normalized because it contains redundant data (e.g., Alice's name is repeated). By normalizing it, we can create two separate tables:

Students Table:

StudentIDStudentName
1Alice
2Bob

Courses Table:

CourseIDCourseName
101Math
102Science
103History

Enrollments Table:

StudentIDCourseID
1101
2102
1103

By normalizing the data, we have eliminated redundancy and improved the structure of the database.

In summary, data normalization is essential for creating efficient, reliable, and scalable databases. It helps reduce redundancy, improve data integrity, simplify maintenance, enhance query performance, and facilitate scalability.

Creating a Table in 1NF (First Normal Form)

First Normal Form (1NF) is the basic level of database normalization. It requires that the values in each column of a table are atomic (indivisible). This means that each column should contain only one value per row, and each row should be unique.

Steps to Create a Table in 1NF

  1. Identify the Entities and Attributes: Determine the entities (tables) and their attributes (columns) that need to be stored in the database.

  2. Ensure Atomicity: Make sure that each column contains only atomic values. This means that each cell in the table should hold a single value, not a set of values or a list.

  3. Eliminate Repeating Groups: Ensure that there are no repeating groups of columns. Each attribute should be dependent on the primary key.

  4. Define the Primary Key: Choose a primary key for the table that uniquely identifies each row.

Example

Let's consider an example where we have a table that stores information about students and their courses. Initially, the table might look like this:

StudentIDStudentNameCourses
1AliceMath, Science
2BobEnglish, History
3CharlieMath, English

Step 1: Identify the Entities and Attributes

Entities: Students Attributes: StudentID, StudentName, Courses

Step 2: Ensure Atomicity

The Courses column contains multiple values, which violates 1NF. We need to split these values into separate rows.

Step 3: Eliminate Repeating Groups

We will create a new table to store the courses for each student.

Step 4: Define the Primary Key

We will use StudentID as the primary key for the Students table and a composite key (StudentID, Course) for the StudentCourses table.

Final Tables in 1NF

Students Table:

StudentIDStudentName
1Alice
2Bob
3Charlie

StudentCourses Table:

StudentIDCourse
1Math
1Science
2English
2History
3Math
3English

By following these steps, we have ensured that our tables are in First Normal Form (1NF). Each column contains atomic values, and there are no repeating groups.

Creating a Table in 2NF (Second Normal Form)

To create a table in 2NF, we need to ensure that the table is already in 1NF (First Normal Form) and that it meets the criteria for 2NF. Here are the steps to achieve this:

Step 1: Ensure the Table is in 1NF

1NF requires that:

  • The table has a primary key.
  • All columns contain atomic (indivisible) values.
  • Each column contains values of a single type.
  • Each column contains unique values or allows NULLs.

Example of a table in 1NF:

CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, OrderDate DATE, ProductID INT, Quantity INT, Price DECIMAL(10, 2) );

Creating a Table in 3NF (Third Normal Form)

To create a table in 3NF, we need to ensure that the table is already in 2NF (Second Normal Form) and that it meets the criteria for 3NF. Here are the steps to achieve this:

Step 1: Ensure the Table is in 2NF

2NF requires that:

  • The table is in 1NF.
  • All non-key attributes are fully functional dependent on the primary key.

Example of a table in 2NF:

CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, OrderDate DATE ); CREATE TABLE OrderDetails ( OrderDetailID INT PRIMARY KEY, OrderID INT, ProductID INT, Quantity INT, Price DECIMAL(10, 2), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) );

Roles and Privileges in PostgreSQL

In PostgreSQL, roles are used to manage database access permissions. Roles can represent a single user or a group of users. Privileges are the permissions granted to roles to perform certain actions on database objects.

Main Concepts
  1. Roles:

    • User Roles: Represent individual users. Created using the CREATE ROLE or CREATE USER command.
    • Group Roles: Represent groups of users. Can be used to manage permissions for multiple users at once.

    Example:

    -- Create a user role CREATE ROLE alice WITH LOGIN PASSWORD 'password'; -- Create a group role CREATE ROLE developers;

-- Alter the role to set a new password for Alice.

ALTER ROLE alice WITH PASSWORD 'new_secure_password';

-- Create a user role for John CREATE ROLE john WITH LOGIN PASSWORD 'john_password'; -- Grant SELECT, INSERT, and UPDATE privileges on the 'students' table to John GRANT SELECT, INSERT, UPDATE ON students TO john;