Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.
The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.
Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.
Here is a data description of the columns you may find helpful.
| Field Name | Description |
|---|---|
inter_dom | Types of students (international or domestic) |
japanese_cate | Japanese language proficiency |
english_cate | English language proficiency |
academic | Current academic level (undergraduate or graduate) |
age | Current age of student |
stay | Current length of stay in years |
todep | Total score of depression (PHQ-9 test) |
tosc | Total score of social connectedness (SCS test) |
toas | Total score of acculturative stress (ASISS test) |
Displaying all available columns and the five rows
Number of Columns: 49
Which are:
inter_dom, region, gender, academic, age, age_cate, stay, stay_cate, japanese, japanese_cate, english, english_cate, intimate, religion, suicide, dep, deptype, todep, depsev, tosc, apd, ahome, aph, afear, acs, aguilt, amiscell, toas, partner, friends, parents, relative, profess, phone, doctor, reli, alone, others, internet, partner_bi, friends_bi, parents_bi, relative_bi, professional_bi, phone_bi, doctor_bi, religion_bi, alone_bi, others_bi, internet_bi
SELECT * FROM students LIMIT 5;
Number of students Surveyed: 286
SELECT COUNT(*) FROM students;
A.1.Explaining what we know with the Columns
- Without further documentation we can only speculate the data or spin a meaning from the heuristic statistics.
A.1.1 Column: inter_dom
Theory of Column: This data pertains to Domestic (Dom) and International (Inter) students. The column includes three categories: Dom, Inter, and NULL. It can be inferred that there are 201 International students, 67 Domestic students, and 18 NULL or empty values.
| index | inter_dom |
|---|---|
| 0 | Inter |
| 1 | Dom |
| 2 | NULL |
SELECT DISTINCT(inter_dom) FROM students GROUP BY inter_dom ORDER BY inter_dom DESC;
| index | inter_dom | count |
|---|---|---|
| 0 | Inter | 201 |
| 1 | Dom | 67 |
| 2 | NULL | 18 |
SELECT DISTINCT inter_dom, COUNT(inter_dom) AS count FROM students GROUP BY inter_dom ORDER BY inter_dom DESC;
A.1.2. Column: region
Theory of Column: This data relates to students from South East Asia (SEA), South Asia (SA), Other Locations (Other), and East Asia (EA). The column contains five categories: SEA, SA, Other, EA, and NULL. It can be inferred that there are 122 students from South East Asia, 18 from South Asia, 11 from Other Locations, 69 from Japan, 48 from East Asia, and 18 NULL or empty values.
| index | region |
|---|---|
| 0 | SEA |
| 1 | SA |
| 2 | Others |
| 3 | EA |
| 4 | NULL |
SELECT DISTINCT(region) FROM students GROUP BY region ORDER BY region DESC;
| index | region | count |
|---|---|---|
| 0 | SEA | 122 |
| 1 | SA | 18 |
| 2 | Others | 11 |
| 3 | JAP | 69 |
| 4 | EA | 48 |
| 5 | NULL | 18 |
SELECT DISTINCT(region), COUNT(region) FROM students GROUP BY region ORDER BY region DESC;
A.1.3. Column: gender
Theory of Column: This data pertains to Male and Female students. It can be inferred that there are 170 Female students, 98 Male students, and 18 NULL or empty values.
| index | gender |
|---|---|
| 0 | Male |
| 1 | Female |
| 2 | NULL |
SELECT DISTINCT(gender) FROM students GROUP BY gender ORDER BY gender DESC;
| index | gender | count |
|---|---|---|
| 0 | Male | 98 |
| 1 | Female | 170 |
| 2 | NULL | 18 |
SELECT DISTINCT(gender), COUNT(gender) FROM students GROUP BY gender ORDER BY gender DESC;
A.1.4. Column: academic
Theory of Column: This data pertains to Undergraduate (Under) and Graduate (Grad) students. It can be inferred that there are 201 Undergraduate, 67 Graduate students, and 18 NULL or empty values.
| index | academic |
|---|---|
| 0 | Under |
| 1 | Grad |
| 2 | NULL |
SELECT DISTINCT(academic) FROM students GROUP BY academic ORDER BY academic DESC;
| index | academic | count |
|---|---|---|
| 0 | Under | 201 |
| 1 | Grad | 67 |
| 2 | NULL | 18 |
SELECT DISTINCT(academic), COUNT(academic) FROM students GROUP BY academic ORDER BY academic DESC;
B. Instructions:
- Explore and analyze the
studentsdata to see how the length of stay (stay) impacts the average mental health diagnostic scores of the international students present in the study.- Return a table with nine rows and five columns.
- The five columns should be aliased as:
stay,count_int,average_phq,average_scs, andaverage_as, in that order. - The average columns should contain the average of the
todep(PHQ-9 test),tosc(SCS test), andtoas(ASISS test) columns for each length of stay, rounded to two decimal places. - The
count_intcolumn should be the number of international students for each length of stay. - Sort the results by the length of stay in descending order.
- Note: Creating new cells in the workbook will rename the DataFrame. Make sure that your final solution uses the name
df.
C. Working through the columns, field by field.
- Which will be in this order (stay, count_int, average_phq, average_scs, and average_as).
C.1. Column: stay
-- Start coding here...
SELECT
stay
FROM students
WHERE stay IS NOT NULL
GROUP BY stay
HAVING stay >= 1
ORDER BY stay DESC
LIMIT 9;C.2. Column: count_int
- The
count_intcolumn should be the number of international students for each length of stay.
-- Start coding here...
SELECT
stay,
COUNT(inter_dom) AS count_int
FROM students
WHERE stay IS NOT NULL
AND inter_dom = 'Inter'
GROUP BY stay
ORDER BY stay DESC
LIMIT 9;