Skip to content

Analyzing Students' Mental Health in SQL

In this live code-along, you'll perform exploratory data analysis on a dataset around mental health of domestic and international students. You'll perform SQL querying to look at how social connectedness and cultural issues affect mental health. Finally, you'll visualize the results of your analysis using the Python Plotly package.

The Data

This survey was conducted in 2018 at an international Japanese university and the associated study was published in 2019. It was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties compared to the general population, and that social connectedness and acculturative stress are predictive of depression.

Social connectedness: measure of belonging to a social group or network.

Acculturative stress: stress associated with learning about and intergrating into a new culture.

See paper for more info, including data description.

Link to the data.

Create database connection

  • Databases > Connect Database > PostgreSQL
  • Database connection name: Live Training Student MH
  • Port: 5432
  • Hostname: workspacedemodb.datacamp.com
  • Database: students
  • Username: students_codealong
  • Password: students_codealong

Inspect the Data

Our data is in one table that includes all of the survey data. There are 50 fields and, according to the paper, 268 records. Each row is a student.

You can check the schema on the left.

  1. Check if the data has 268 records.
Spinner
DataFrameas
df
variable
-- Count the number of records in the data

SELECT COUNT(*) AS total_records
FROM students;
  1. Inspect the dataset to see what the fields look like.
Spinner
DataFrameas
df
variable
-- Inspect the data and limit the output to 5 records

SELECT *
FROM public.students
LIMIT(5);
  1. How many international and domestic students are in the data set?
Spinner
DataFrameas
df
variable
-- Count the number of international and domestic students

SELECT inter_dom,
		COUNT(inter_dom) AS count_inter_dom
FROM students
GROUP BY inter_dom;	
  1. Look into the 18 unassigned rows to understand what they could be.
Spinner
DataFrameas
df
variable
-- Query the data to see all records where inter_dom is neither 'Dom' nor 'Inter'

SELECT *
FROM public.students
WHERE inter_dom NOT LIKE 'D%' AND inter_dom NOT LIKE 'I%';
  1. Where are the international students from?
Spinner
DataFrameas
df
variable
-- See what Region international students are from

SELECT region, COUNT(inter_dom) AS count_inter_dom
FROM public.students
WHERE inter_dom = 'Inter'
GROUP BY region;

Understanding the scores

  1. Find the minimum, maximum, and average of each of the diagnostic tests (PHQ-9, SCS, ASISS). This information is in the paper, but it's good practice to look this up yourself during analysis.