Skip to content

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

Field NameDescription
inter_domTypes of students (international or domestic)
japanese_cateJapanese language proficiency
english_cateEnglish language proficiency
academicCurrent academic level (undergraduate or graduate)
ageCurrent age of student
stayCurrent length of stay in years
todepTotal score of depression (PHQ-9 test)
toscTotal score of social connectedness (SCS test)
toasTotal score of acculturative stress (ASISS test)
Spinner
DataFrameas
students
variable
-- Run this code to save the CSV file as students
SELECT
	* 
FROM
	'students.csv' AS students;
Spinner
DataFrameas
df1
variable
-- This query simply counts the number of records/rows in the dataset.
SELECT
	COUNT(*) AS total_records
FROM
	students;

We see that there are 286 records in this dataset. However, it's important to note that the row count starts at 0. Furthermore, there are several rows with null values. In other words, fewer than 286 students were included in this study. We gain more insight into the sample population in the next queries.

Spinner
DataFrameas
df2
variable
-- This query returns the count of each student type, "Inter" for international and "Dom" for 
--domestic.
SELECT
	inter_dom,
	COUNT(inter_dom) AS count_inter_dom 
FROM
	students
GROUP BY
	inter_dom;

Based on the above table, there are 201 international students and 67 international students. Since SQL does not count null values, the count for null is 0. We know that there are 286 total rows, so we can determine that there are 18 rows with no information. To be thorough, let's run a quick query to count the number of null records.

Spinner
DataFrameas
df6
variable
--This query returns the count for null values, but does not count the values for "Inter" and "Dom"
SELECT
	inter_dom,
	SUM(CASE WHEN inter_dom IS NULL THEN 1 ELSE 0 END) AS count_inter_dom_nulls
FROM
	students
GROUP BY
	inter_dom;

Now let's get to know the data better by taking a look at the data for each student type. I start with International.

Spinner
DataFrameas
df3
variable
SELECT
	*
FROM
	students
WHERE 
	inter_dom = 'Inter';

Next, let's look at Domestic.

Spinner
DataFrameas
df5
variable
SELECT
	*
FROM
	students
WHERE 
	inter_dom = 'Dom';

There aren't any big takeaways from glancing at the data for each student, so let's dive in further. We can calculate summary statistics using aggregate functions to compare how each group scored on depression, social connectedness, and stress.

Spinner
DataFrameas
df
variable
SELECT
	inter_dom,
	MIN(todep) AS min_phq,
	MAX(todep) AS max_phq,
	ROUND(AVG(todep), 2) AS avg_phq,
	MIN(tosc) AS min_scs,
	MAX(tosc) AS max_scs,
	ROUND(AVG(tosc), 2) AS avg_tosc,
	MIN(toas) AS min_toas,
	MAX(toas) AS max_toas,
	ROUND(AVG(toas), 2) AS avg_toas
FROM
	students
WHERE
	inter_dom  = 'Inter'	
		OR
			inter_dom  = 'Dom'
GROUP BY inter_dom;