Skip to content

Exploratory data analysis with PostgreSQL

Date: 18.6.2023

Data: Cinema dataset by Datacamp. It contains 5 tables:

  1. descriptions (columns: description, length, rating),
  2. films (columns: id, title, release_year, country, duration, language, certification, gross, budget),
  3. people (columns: id, name, birthdate, deathdate),
  4. reviews (columns: id, film_id, num_user, num_critic, imdb_score, num_votes, facebook_likes)
  5. roles (columns: id, film_id, person_id, role).

Exploratory data analysis steps:

  1. Import
  2. Tidy
  3. Explore & Understand: Transform
  4. Explore & Understand: Model
  5. Explore & Understand: Visualise
  6. Communicate

Step 1. Import

1. Modify tables

Create table (if needed)

CREATE TABLE IF NOT EXISTS cinema.films (

id SERIAL PRIMARY KEY,

title VARCHAR(255) NOT NULL,

release_year INTEGER NOT NULL,

country VARCHAR(255) NOT NULL,

duration INTEGER NOT NULL,

language VARCHAR(255) NOT NULL,

certification VARCHAR(255) NOT NULL,

gross NUMERIC(10,2) NOT NULL,

budget NUMERIC(10,2) NOT NULL

);

Add column

ALTER TABLE cinema.films

ADD COLUMN shooting_location VARCHAR(255);

Change data type of the column

ALTER TABLE cinema.films

ALTER COLUMN budget TYPE TEXT;

2. Exploring tables

  1. descriptions table
Spinner
DataFrameas
descriptions
variable
SELECT *
FROM cinema.descriptions;
  1. films table
Spinner
DataFrameas
films
variable
SELECT *
FROM cinema.films;
  1. people table
Spinner
DataFrameas
people
variable
SELECT *
FROM cinema.people;
  1. reviews table
Spinner
DataFrameas
reviews
variable
SELECT *
FROM cinema.reviews;
  1. roles table
Spinner
DataFrameas
roles
variable
SELECT *
FROM cinema.roles;