Skip to content

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

Field NameDescription
inter_domTypes of students (international or domestic)
japanese_cateJapanese language proficiency
english_cateEnglish language proficiency
academicCurrent academic level (undergraduate or graduate)
ageCurrent age of student
stayCurrent length of stay in years
todepTotal score of depression (PHQ-9 test)
toscTotal score of social connectedness (SCS test)
toasTotal score of acculturative stress (ASISS test)
Spinner
DataFrameas
students
variable
-- Run this code to view the data in students
SELECT *
FROM students
Hidden output

Introduction

Hi! My name is Nathany, and starting from this point, the code below is entirely my work - including the structure/organization of the content. This is the notebook of a SQL Project, so it is in Code mode intentionally, but you can switch to Report mode using the option in the top-right corner. The charts will be empty unless you are able to run the notebook code.

Query Context

The SQL query calculates the following for international students (inter_dom = 'Inter') grouped by their length of stay (stay) in years:

  • count_int: Number of students for each length of stay.
  • average_phq: Average score of depression (PHQ-9 test).
  • average_scs: Average score of social connectedness (SCS test).
  • average_as: Average score of acculturative stress (ASISS test).

The results are sorted by stay in descending order (longest to shortest).

Below is the SQL code I developed for the project. I have included comments to explain the steps and rationale behind the code, ensuring you don't need to repeatedly refer back to the instructions. Additionally, I’ve clarified the acronyms used for better understanding. If you are in the "Report" mode, click in the "View code" in the right of the block or "Code" in the top-right-corner.

Spinner
DataFrameas
df
variable
SELECT 
    stay, -- Current length of stay in years
    COUNT(*) AS count_int, -- Number of international students for each length of stay
    ROUND(AVG(todep), 2) AS average_phq, -- Average of the Total score of depression (PHQ-9 test)
    ROUND(AVG(tosc), 2) AS average_scs, -- Average of the Total score of social connectedness (SCS test)
    ROUND(AVG(toas), 2) AS average_as -- Average of the Total score of acculturative stress (ASISS test)
FROM 
    students -- Table with the survey results
WHERE 
    inter_dom = 'Inter' -- Only International student
GROUP BY 
    stay -- Group by the length of stay in years
ORDER BY 
    stay DESC; -- Sort from the longest to the shortest length of stay

📊 Following below the result of the query when "Run" notebook is not possible

staycount_intaverage_phqaverage_scsaverage_as
101133250
81104465
7144845
6363858.67
5103491
4148.5733.9387.71
3469.0937.1378
2398.2837.0877.67
1957.4838.1172.8

Sample Size Dilemma

Most international students stay within the first four years of their program, likely due to completing their studies, dropping out, or other reasons. This pattern is expected, but the sharp decline in student numbers after the fourth year makes the data less reliable. Averages and other statistics become less meaningful because they are based on very few students. For example:

  • After four years, only one student remains in most years, except for Year 6, which has three students.
  • With such a small sample size, the averages are easily skewed and reflect individual cases rather than general trends.

This lack of representation in later years makes it harder to draw conclusions that apply to most students. It shows the need to handle and interpret this kind of data carefully.

Alternatives

  1. Maintain the Current Sample: Keep the dataset as is, including the stay field without modifications.
  2. Refine the Sample: Create a new dataset that excludes stay > 4.
  3. Aggregate Long-Term Stay Data: Create a new dataset where stay > 4 is combined into a broader category (e.g., "Long-Term Stay: 4+ Years").

Notes on Alternative 1

A reason to not follow Alternative 1 (Maintain the Current Sample) is that it includes unreliable data for students with stay > 4, which significantly reduces the accuracy of statistical measures like averages and trends. The very small sample size in later years (e.g., 1–3 students) makes the data unrepresentative of the larger group, leading to misleading conclusions.

Notes on Alternative 2 ✅

The chosen alternative for this analysis highlights that shorter stays (1–4 years) often coincide with periods of transition and adaptation, such as cultural shock and academic pressure, which can negatively impact mental health. While it would be expected for these scores to improve year after year, the data instead shows a worsening trend in this period, with consistently lower scores over time.

Notes on Alternative 3

The analysis suggests entirely different trends of the Alternative 2, with positive outcomes observed across all scores. However, the small sample size in this category (only 7 students) limits the reliability of these conclusions - attention to only one student remains in most years, except for Year 6, which has three students. However, there is the hypothesis that students who stay longer (though fewer in number) represent those who have successfully adapted or developed stronger coping mechanisms, thereby raising the overall averages.

Analysis

Analysis (Alternartive 2)

  • Depression (PHQ-9 Test): Average scores increase over time.
    • Scores rise consistently from 7.48 in Year 1 to a peak of 9.09 in Year 3, before slightly declining to 8.57 in Year 4.
    • Despite this decline, Year 4 scores remain higher than in Years 1 and 2.
  • Social Connectedness (SCS Test): Average scores decrease over time.
    • Scores steadily drop from 38.11 in Year 1 to 33.93 in Year 4, indicating a gradual decline in social connectedness among international students.
  • Acculturative Stress (ASISS Test): Average scores increase significantly over time.
    • Stress scores rise sharply from 72.8 in Year 1 to 87.71 in Year 4, reflecting escalating challenges in cultural and environmental adaptation.
Spinner
DataFrameas
stay4years
variable
SELECT 
    stay, -- Current length of stay in years
    COUNT(*) AS count_int, -- Number of international students for each length of stay
    ROUND(AVG(todep), 2) AS average_phq, -- Average of the Total score of depression (PHQ-9 test)
    ROUND(AVG(tosc), 2) AS average_scs, -- Average of the Total score of social connectedness (SCS test)
    ROUND(AVG(toas), 2) AS average_as -- Average of the Total score of acculturative stress (ASISS test)
FROM 
    students -- Table with the survey results
WHERE 
    inter_dom = 'Inter' and stay <= 4 -- -- Only International student with 4 years or less in stay.
GROUP BY 
    stay -- Group by the length of stay in years
ORDER BY 
    stay DESC; -- Sort from the longest to the shortest length of stay
‌
‌
‌