add text here
In the SQL cell below:
- Set the source to 'Course Databases' to query databases used in SQL courses.
- Set the source to 'DataFrames and CSVs' to query this course's CSV files (if any) with SQL
- Write SQL! Note that you cannot run queries that modify the database.
Concepts about Database and Schema in PostgreSQL
Database
- Definition: A database in PostgreSQL is a collection of schemas, which in turn contain tables, views, functions, and other objects.
- Creation: Databases are created using the
CREATE DATABASEcommand. - Connection: To interact with a database, you must connect to it using a client or application.
- Isolation: Each database is isolated from others, meaning objects in one database are not directly accessible from another.
Schema
- Definition: A schema is a logical container within a database that holds tables, views, indexes, sequences, data types, functions, and other objects.
- Namespace: Schemas provide a namespace to organize database objects, allowing the same object name to be used in different schemas.
- Default Schema: PostgreSQL includes a default schema named
public. Unless specified otherwise, new objects are created in this schema. - Creation: Schemas are created using the
CREATE SCHEMAcommand. - Usage: Objects within a schema are referenced using the
schema_name.object_namenotation.
Example Commands
Creating a Database
CREATE DATABASE my_database;
CREATE SCHEMA new_schema;
Important Data Types in PostgreSQL
PostgreSQL supports a wide range of data types to accommodate various kinds of data. Here are some of the most commonly used data types:
Numeric Types
smallint: 2-byte integer, range -32,768 to 32,767.integer: 4-byte integer, range -2,147,483,648 to 2,147,483,647.bigint: 8-byte integer, range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.decimal: Variable precision numeric type.numeric: Similar todecimal, used for exact numeric values.real: 4-byte floating point number.double precision: 8-byte floating point number.serial: Auto-incrementing 4-byte integer.bigserial: Auto-incrementing 8-byte integer.
Character Types
char(n): Fixed-length character type.varchar(n): Variable-length character type with a limit.text: Variable-length character type with no specific limit.
Date/Time Types
timestamp: Date and time (without time zone).timestamptz: Date and time (with time zone).date: Calendar date (year, month, day).time: Time of day (without time zone).timetz: Time of day (with time zone).interval: Time span.
Boolean Type
boolean: Logical Boolean (true/false).
UUID Type
uuid: Universally unique identifier.
Array Type
array: Variable-length array of any data type.
JSON Types
json: Textual JSON data.jsonb: Binary JSON data, more efficient for processing.
Special Types
bytea: Binary data ("byte array").inet: IPv4 or IPv6 host address.cidr: IPv4 or IPv6 network.macaddr: MAC address.
Example Commands
Creating a Table with Various Data Types
CREATE TABLE example_table ( id serial PRIMARY KEY, name varchar(100), age integer, salary numeric(10, 2), created_at timestamptz DEFAULT current_timestamp, is_active boolean, data jsonb );
Categorical Data in PostgreSQL
In PostgreSQL, categorical data can be represented using various data types such as ENUM, TEXT, or VARCHAR. Here are some examples of how to use these data types to handle categorical data:
Using ENUM
The ENUM type is useful for representing a fixed set of values. For example, you can define an ENUM type for days of the week:
CREATE TYPE day_of_week AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'); CREATE TABLE schedule ( id SERIAL PRIMARY KEY, day day_of_week NOT NULL );
Using Boolean Data
In PostgreSQL, Boolean data is represented using the BOOLEAN data type. This type can store TRUE, FALSE, and NULL values. Here is an example of how to use the BOOLEAN data type in a table:
CREATE TABLE users ( id SERIAL PRIMARY KEY, username VARCHAR(50) NOT NULL, is_active BOOLEAN NOT NULL DEFAULT TRUE ); -- Insert some example data INSERT INTO users (username, is_active) VALUES ('alice', TRUE); INSERT INTO users (username, is_active) VALUES ('bob', FALSE); INSERT INTO users (username) VALUES ('charlie'); -- Defaults to TRUE -- Query the table SELECT * FROM users;
Temporal Data Types in PostgreSQL
PostgreSQL provides several data types to handle temporal data, such as dates, times, and timestamps. Here are some examples of how to use these data types:
DATE
The DATE type stores the calendar date (year, month, day) without any time information.
CREATE TABLE events ( id SERIAL PRIMARY KEY, event_name VARCHAR(100) NOT NULL, event_date DATE NOT NULL ); -- Insert some example data INSERT INTO events (event_name, event_date) VALUES ('Conference', '2023-10-15'); INSERT INTO events (event_name, event_date) VALUES ('Meeting', '2023-11-01'); -- Query the table SELECT * FROM events;
The Importance of Data Normalization
Data normalization is a crucial process in database design that involves organizing the columns (attributes) and tables (relations) of a database to reduce data redundancy and improve data integrity. Here are some key reasons why data normalization is important:
1. Reduces Data Redundancy
Normalization eliminates duplicate data by ensuring that each piece of information is stored only once. This reduces the amount of storage space required and minimizes the risk of data inconsistencies.
2. Improves Data Integrity
By organizing data into related tables and enforcing relationships through foreign keys, normalization ensures that data remains accurate and consistent. This helps maintain the integrity of the database.
3. Simplifies Data Maintenance
Normalized databases are easier to maintain because updates, deletions, and insertions are performed in a single place. This reduces the complexity of database operations and minimizes the risk of anomalies.
4. Enhances Query Performance
While normalization can sometimes lead to more complex queries, it often results in more efficient data retrieval. Properly indexed normalized tables can improve query performance by reducing the amount of data that needs to be scanned.
5. Facilitates Scalability
A well-normalized database design can handle growth more effectively. As the database scales, normalized structures help ensure that performance remains stable and that the database can accommodate increasing amounts of data.
Normal Forms
Normalization is typically achieved through a series of steps known as normal forms. The most commonly used normal forms are:
- First Normal Form (1NF): Ensures that the table has a primary key and that all columns contain atomic (indivisible) values.
- Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key.
- Third Normal Form (3NF): Builds on 2NF by ensuring that all non-key attributes are not only fully functionally dependent on the primary key but also independent of each other.
Example
Consider a table that stores information about students and their courses:
| StudentID | StudentName | CourseID | CourseName |
|---|---|---|---|
| 1 | Alice | 101 | Math |
| 2 | Bob | 102 | Science |
| 1 | Alice | 103 | History |
This table is not normalized because it contains redundant data (e.g., Alice's name is repeated). By normalizing it, we can create two separate tables:
Students Table:
| StudentID | StudentName |
|---|---|
| 1 | Alice |
| 2 | Bob |
Courses Table:
| CourseID | CourseName |
|---|---|
| 101 | Math |
| 102 | Science |
| 103 | History |
Enrollments Table:
| StudentID | CourseID |
|---|---|
| 1 | 101 |
| 2 | 102 |
| 1 | 103 |
By normalizing the data, we have eliminated redundancy and improved the structure of the database.
In summary, data normalization is essential for creating efficient, reliable, and scalable databases. It helps reduce redundancy, improve data integrity, simplify maintenance, enhance query performance, and facilitate scalability.
Creating a Table in 1NF (First Normal Form)
First Normal Form (1NF) is the basic level of database normalization. It requires that the values in each column of a table are atomic (indivisible). This means that each column should contain only one value per row, and each row should be unique.
Steps to Create a Table in 1NF
-
Identify the Entities and Attributes: Determine the entities (tables) and their attributes (columns) that need to be stored in the database.
-
Ensure Atomicity: Make sure that each column contains only atomic values. This means that each cell in the table should hold a single value, not a set of values or a list.
-
Eliminate Repeating Groups: Ensure that there are no repeating groups of columns. Each attribute should be dependent on the primary key.
-
Define the Primary Key: Choose a primary key for the table that uniquely identifies each row.
Example
Let's consider an example where we have a table that stores information about students and their courses. Initially, the table might look like this:
| StudentID | StudentName | Courses |
|---|---|---|
| 1 | Alice | Math, Science |
| 2 | Bob | English, History |
| 3 | Charlie | Math, English |
Step 1: Identify the Entities and Attributes
Entities: Students Attributes: StudentID, StudentName, Courses
Step 2: Ensure Atomicity
The Courses column contains multiple values, which violates 1NF. We need to split these values into separate rows.
Step 3: Eliminate Repeating Groups
We will create a new table to store the courses for each student.
Step 4: Define the Primary Key
We will use StudentID as the primary key for the Students table and a composite key (StudentID, Course) for the StudentCourses table.
Final Tables in 1NF
Students Table:
| StudentID | StudentName |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
StudentCourses Table:
| StudentID | Course |
|---|---|
| 1 | Math |
| 1 | Science |
| 2 | English |
| 2 | History |
| 3 | Math |
| 3 | English |
By following these steps, we have ensured that our tables are in First Normal Form (1NF). Each column contains atomic values, and there are no repeating groups.
Creating a Table in 2NF (Second Normal Form)
To create a table in 2NF, we need to ensure that the table is already in 1NF (First Normal Form) and that it meets the criteria for 2NF. Here are the steps to achieve this:
Step 1: Ensure the Table is in 1NF
1NF requires that:
- The table has a primary key.
- All columns contain atomic (indivisible) values.
- Each column contains values of a single type.
- Each column contains unique values or allows NULLs.
Example of a table in 1NF:
CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, OrderDate DATE, ProductID INT, Quantity INT, Price DECIMAL(10, 2) );
Creating a Table in 3NF (Third Normal Form)
To create a table in 3NF, we need to ensure that the table is already in 2NF (Second Normal Form) and that it meets the criteria for 3NF. Here are the steps to achieve this:
Step 1: Ensure the Table is in 2NF
2NF requires that:
- The table is in 1NF.
- All non-key attributes are fully functional dependent on the primary key.
Example of a table in 2NF:
CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, OrderDate DATE ); CREATE TABLE OrderDetails ( OrderDetailID INT PRIMARY KEY, OrderID INT, ProductID INT, Quantity INT, Price DECIMAL(10, 2), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) );
Roles and Privileges in PostgreSQL
In PostgreSQL, roles are used to manage database access permissions. Roles can represent a single user or a group of users. Privileges are the permissions granted to roles to perform certain actions on database objects.
Main Concepts
-
Roles:
- User Roles: Represent individual users. Created using the
CREATE ROLEorCREATE USERcommand. - Group Roles: Represent groups of users. Can be used to manage permissions for multiple users at once.
Example:
-- Create a user role CREATE ROLE alice WITH LOGIN PASSWORD 'password'; -- Create a group role CREATE ROLE developers; - User Roles: Represent individual users. Created using the
-- Alter the role to set a new password for Alice.
ALTER ROLE alice WITH PASSWORD 'new_secure_password';
-- Create a user role for John CREATE ROLE john WITH LOGIN PASSWORD 'john_password'; -- Grant SELECT, INSERT, and UPDATE privileges on the 'students' table to John GRANT SELECT, INSERT, UPDATE ON students TO john;