Skip to content
EDA with PostgreSQL: Cinema dataset
Exploratory data analysis with PostgreSQL
Date: 18.6.2023
Data: Cinema dataset by Datacamp. It contains 5 tables:
- descriptions (columns: description, length, rating),
- films (columns: id, title, release_year, country, duration, language, certification, gross, budget),
- people (columns: id, name, birthdate, deathdate),
- reviews (columns: id, film_id, num_user, num_critic, imdb_score, num_votes, facebook_likes)
- roles (columns: id, film_id, person_id, role).
Exploratory data analysis steps:
- Import
- Tidy
- Explore & Understand: Transform
- Explore & Understand: Model
- Explore & Understand: Visualise
- Communicate
Step 1. Import
1. Modify tables
Create table (if needed)
CREATE TABLE IF NOT EXISTS cinema.films (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
release_year INTEGER NOT NULL,
country VARCHAR(255) NOT NULL,
duration INTEGER NOT NULL,
language VARCHAR(255) NOT NULL,
certification VARCHAR(255) NOT NULL,
gross NUMERIC(10,2) NOT NULL,
budget NUMERIC(10,2) NOT NULL
);
Add column
ALTER TABLE cinema.films
ADD COLUMN shooting_location VARCHAR(255);
Change data type of the column
ALTER TABLE cinema.films
ALTER COLUMN budget TYPE TEXT;
2. Exploring tables
- descriptions table
DataFrameas
descriptions
variable
SELECT *
FROM cinema.descriptions;- films table
DataFrameas
films
variable
SELECT *
FROM cinema.films;- people table
DataFrameas
people
variable
SELECT *
FROM cinema.people;- reviews table
DataFrameas
reviews
variable
SELECT *
FROM cinema.reviews;- roles table
DataFrameas
roles
variable
SELECT *
FROM cinema.roles;