Skip to content

Does attending college abroad have an impact on your mental health? After surveying its students in 2018, a Japanese international university released a research the following year that received approval from multiple regulatory and ethical bodies.

According to the study, depression is predicted by social connection (belonging to a social group) and acculturative stress (stress related to assimilating into a new culture), and overseas students are more likely than the general population to experience mental health issues.

By exploring the student data using PostgreSQL, the objective was to understand if there is a link between social connection and acculturative stress on depression levels, and if the duration of their stay at the University plays a role in their overall declining mental health.

# Let's first load and inspect the uploaded CSV file to understand its structure and contents.

import pandas as pd

# Load the CSV file
file_path = 'students.csv'  
students = pd.read_csv(file_path)

# Display the first few rows to understand the structure
students.head()

Below are some descriptions of columns within the students dataset that may provide useful context for this investigation.

Field NameDescription
inter_domTypes of students (international or domestic)
japanese_cateJapanese language proficiency
english_cateEnglish language proficiency
academicCurrent academic level (undergraduate or graduate)
ageCurrent age of student
stayCurrent length of stay in years
todepTotal score of depression (PHQ-9 test)
toscTotal score of social connectedness (SCS test)
toasTotal score of acculturative stress (ASISS test)
  1. Exploring and understanding the data:

To begin, I conducted a count of total number of records, and grouped the records to differentiate between international and domestic students alike and analyse their respective quantities.

Spinner
DataFrameas
count_records
variable
-- Count the number of records in the dataset
SELECT 
	COUNT(*) AS total_records
FROM 
	students;
Spinner
DataFrameas
summary_phq
variable
-- How many international and domestic students are there?
SELECT 
	inter_dom,
	COUNT(*) AS count_inter_dom
FROM 
	students
WHERE 
	inter_dom IS NOT NULL
GROUP BY 
	inter_dom
ORDER BY 
	count_inter_dom DESC;
  1. Now we want to understand and query the summary statistics of diagnostic scores for all students

Here we will be retreiving summary statistics for each diagnostic test (PHQ-9, SCS & ASISS) utilising aggregate functions. These functions will help to understand the different tests and what the scores mean.

Spinner
DataFrameas
df1
variable
-- Summary statistics of diagnostic scores for international students
SELECT 
	MIN(todep) min_phq,
	MAX(todep) AS max_phq,
	ROUND(AVG(todep),2) AS avg_phq,
	MIN(tosc) min_scs,
	MAX(tosc) AS max_scs,
	ROUND(AVG(tosc),2) AS avg_scs,
	MIN(toas) min_as,
	MAX(toas) AS max_as,
	ROUND(AVG(toas),2) AS avg_as
FROM students
WHERE inter_dom = 'Inter';
  1. Filter the data for context on international students

Here I wanted to understand more about the students to better analyse the data and provide relevant recommendations, taking into account demographics such as age, nationality, gender & education.

Spinner
DataFrameas
df4
variable
-- Understand the distribution of students across age and region
SELECT 
    age, 
    region AS nationality, 
    COUNT(*) AS student_count
FROM 
    students
WHERE 
    inter_dom = 'Inter'
GROUP BY 
    age, region
ORDER BY 
    student_count DESC;
Spinner
DataFrameas
df5
variable
-- To calculate the proportion of each gender among international students with average diagnostics
SELECT 
    gender, 
    COUNT(*) AS gender_count,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM students WHERE inter_dom = 'Inter'), 2) AS gender_percentage,
	ROUND(AVG(todep),2) AS average_dep,
	ROUND(AVG(tosc),2) AS average_scs,
	ROUND(AVG(toas),2) AS average_as
FROM 
    students
WHERE 
    inter_dom = 'Inter'
GROUP BY 
    gender
ORDER BY 
    gender_percentage DESC;
Spinner
DataFrameas
df6
variable
-- To analyse living situations with average diagnostics
SELECT 
    alone_bi AS lives_alone, 
    COUNT(*) AS student_count,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM students WHERE inter_dom = 'Inter'), 2) AS percentage,
	ROUND(AVG(todep),2) AS average_dep,
	ROUND(AVG(tosc),2) AS average_scs,
	ROUND(AVG(toas),2) AS average_as
FROM 
    students
WHERE 
    inter_dom = 'Inter'
GROUP BY 
    alone_bi;

This information provides important context for further analysis to understand what influences depression levels and helps to prevent bias in our analysis. It will also assist in informing us when deciding on actionable strategies to combat mental health in international students.

Key Insights:

  1. The majority of students in the dataset are aged 18-21, and are likely navigating adult challenges alongside cultural adjustment.
  2. Most students come from SEA (South East Asia), providing context when designing mental health support programs.
  3. Nearly 65% of students in the dataset are Female, highlighting the need to consider gender-specific mental health concerns.
  4. 27% of students live alone, a factor strongly associated with social isolation and higher depression levels.