Project: Analyzing Students' Mental Health

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

Field Name	Description
`inter_dom`	Types of students (international or domestic)
`japanese_cate`	Japanese language proficiency
`english_cate`	English language proficiency
`academic`	Current academic level (undergraduate or graduate)
`age`	Current age of student
`stay`	Current length of stay in years
`todep`	Total score of depression (PHQ-9 test)
`tosc`	Total score of social connectedness (SCS test)
`toas`	Total score of acculturative stress (ASISS test)

EXPLORE THE DATA IN THE SOURCE TABLE

The goal of this process is to:
1. Familiarize oneself with the data types, column headings, and so on
2. Check the dataset for missing values and inconsistencies before performing: a. Data cleaning and transformation b. Exploratory Data Analysis (EDA) c. Data analysis and visualization

DataFrameas

df

variable

SELECT *
FROM public.students;

DataFrameas

df18

variable

-- Data Types

SELECT * FROM information_schema.columns WHERE table_name = 'students';

DataFrameas

df15

variable

-- Using a Common Table Expression (CTE) to work with the data without creating a permanent object (table)

WITH students_info_staging AS (
  SELECT * FROM public.students
)
SELECT * FROM students_info_staging;

-- Data Cleaning Function & Techniques

-- To address missing values, to standardize the dataset, to deal with whitespaces and so on

-- 1. Identify and remove duplicates -- 2. Null values or blank values -- 3. Standardize the data -- 4. Remove unnecessary rows and columns

DataFrameas

df16

variable

-- Checking duplicates
-- Query 1: Identifying duplicates in specific columns using a CTE

WITH students_info_staging AS (
    SELECT
        inter_dom,
        region,
        academic,
        stay,
        gender,
        ROW_NUMBER() OVER(PARTITION BY inter_dom, region, academic, stay, gender ORDER BY inter_dom) as rn
    FROM public.students
)
SELECT
    inter_dom,
    region,
    academic,
    stay,
    gender
FROM students_info_staging
WHERE rn = 1;

DataFrameas

df8

variable

-- Query 2: Identify Duplicates Using SELECT * and a CTE

WITH students_info_staging AS (
    SELECT
        *,
        ROW_NUMBER() OVER(PARTITION BY inter_dom, region, academic, stay, gender ORDER BY inter_dom) as rn
    FROM public.students
)
SELECT *
FROM students_info_staging
WHERE rn > 1;

REMOVING DUPLICATES FROM THE SOURCE TABLE

Since I am working with a read-only file, I cannot remove duplicates from the source table.
The DELETE command is a write operation, which is not allowed.
Instead, I will use a query that returns only the unique rows.
This will effectively "remove" the duplicates from the final result set without changing the underlying data.

DataFrameas

df17

variable

-- Query to Select Only Unique Rows

WITH students_info_staging AS (
    SELECT
        *,
        ROW_NUMBER() OVER(PARTITION BY inter_dom, region, academic, stay, gender ORDER BY inter_dom) as rn
    FROM public.students
)
SELECT
    *
FROM students_info_staging
WHERE rn = 1;

IDENTIFYING AND HANDLING NULL/BLANK VALUES

Since the underlying data is a read-only file, I identified and handled null or blank values without changing the source data.
I identified and excluded rows with nulls using a WHERE clause and CTE to perform this process.

DataFrameas

df19

variable

-- Using a WHERE clause and CTE

WITH students_info_staging AS (
    SELECT
        inter_dom,
        region,
        academic,
        stay,
        gender
    FROM public.students
)
SELECT *
FROM students_info_staging
WHERE
    inter_dom IS NOT NULL AND inter_dom != '' AND
    region IS NOT NULL AND region != '' AND
    academic IS NOT NULL AND academic != '' AND
    stay IS NOT NULL AND stay > 0 AND 
    gender IS NOT NULL AND gender != '';

EXPLANATORY DATA ANALYSIS

Count the frequency of each distribution
Calculate summary statistics such as mean, median, and standard deviation.

‌
‌
‌

Project: Analyzing Students' Mental Health

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}IDENTIFYING AND HANDLING NULL/BLANK VALUES

IDENTIFYING AND HANDLING NULL/BLANK VALUES