Skip to content

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Exploring the students dataset using PostgreSQL to determine whether I would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns that may be helpful for the analysis.

Field NameDescription
inter_domTypes of students (international or domestic)
japanese_cateJapanese language proficiency
english_cateEnglish language proficiency
academicCurrent academic level (undergraduate or graduate)
ageCurrent age of student
stayCurrent length of stay in years
todepTotal score of depression (PHQ-9 test)
toscTotal score of social connectedness (SCS test)
toasTotal score of acculturative stress (ASISS test)
Spinner
DataFrameas
students
variable
-- Run this code to save the CSV file as students
SELECT * 
FROM 'students.csv';

The following are the exploratory steps.

I start by counting all of the records in the dataset.

Spinner
DataFrameas
df0
variable
SELECT COUNT(*)
FROM 'students.csv';

Then, I count all records perstudent type to see how the records are categorized and scored.

Spinner
DataFrameas
df1
variable
SELECT inter_dom, COUNT(inter_dom) AS count_inter_dom, COUNT(*) AS total_records
FROM 'students.csv'
GROUP BY inter_dom;

There are 201 records are international and 67 are domestic. However, the table also has 18 with NULL value.

Next, I will filter the data to see how it differs between the students types.

Spinner
DataFrameas
df2
variable
SELECT *
FROM 'students.csv'
WHERE inter_dom = 'Inter';
Spinner
DataFrameas
df3
variable
SELECT *
FROM 'students.csv'
WHERE inter_dom = 'Dom';
Spinner
DataFrameas
df4
variable
SELECT *
FROM 'students.csv'
WHERE inter_dom IS NULL;

As we can see with international student type, we have students are in both undergraduate and graduate academic levels with age range is from 17 to over 30. On the other hand, students who do not study aboard are all undergraduate students, and all of them are under 30.

With NULL records, there are clearly no valuable information.

I am going to find the summary statistics of the diagnostic tests for all students.

Spinner
DataFrameas
df
variable
SELECT 'Average', ROUND(AVG(todep),2) AS 'Score of depression (PHQ-9 test)', ROUND(AVG(tosc), 2) AS 'Score of social connectedness (SCS test)', ROUND(AVG(toas), 2) AS 'Score of acculturative stress (ASISS test)'
FROM 'students.csv'
UNION
SELECT 'Standard deviation', ROUND(STDDEV(todep), 2) AS 'Score of depression (PHQ-9 test)', ROUND(STDDEV(tosc), 2) AS 'Score of social connectedness (SCS test)', ROUND(STDDEV(toas),2) AS 'Score of acculturative stress (ASISS test)'
FROM 'students.csv'
UNION
SELECT 'Max', MAX(todep) AS 'Score of depression (PHQ-9 test)', MAX(tosc) AS 'Score of social connectedness (SCS test)', MAX(toas) AS 'Score of acculturative stress (ASISS test)'
FROM 'students.csv'
UNION
SELECT 'Min', MIN(todep) AS 'Score of depression (PHQ-9 test)', MIN(tosc) AS 'Score of social connectedness (SCS test)', MIN(toas) AS 'Score of acculturative stress (ASISS test)'
FROM 'students.csv';

I am going to repeat the summary statistics steps but this time only applies to international students.