Back to Templates
Student Happiness Survey Data
Explore the student happiness survey dataset and publish your findings.
# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
Load your data
# Upload data as a .csv file
df = pd.read_csv('survey.csv', index_col = 'response_id')
df.head()
Career | Citizenship | Nationality | Year since Matriculation | Year of Study | Primary Programme | Gender | Department | Housing Type | Q1-How many events have you Volunteered in ? | Q2-How many events have you Participated in ? | Q3-How many activities are you Interested in ? | Q4-How many activities are you Passionate about ? | Q5-What are your levels of stress ? | Q6-How Satisfied You are with your Student Life ? | Q7-How much effort do you make to interact with others ? | Q8-About How events are you aware about ? | Q9-What is an ideal student life ? | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
response_id | ||||||||||||||||||
1 | UGRD | Foreigner | Indonesia | 2 | 2 | Bachelor of Science | F | School of Science | Residences | 0 | 1 | 3 | 1 | 1 | 2 | 2.0 | 2.0 | NaN |
2 | UGRD | Country Citizen | Singapore | 1 | 1 | Bachelor of Engineering | F | School of Engineering | Out of Campus | 0 | 1 | 2 | 3 | 1 | 2 | 2.0 | 3.0 | Friends+CCas+good result |
3 | UGRD | Foreigner | Malaysia | 2 | 2 | Bachelor of Science | M | School of Science | Halls | 3 | 1 | 1 | 5 | 2 | 2 | 2.0 | 2.0 | just want everything to go smooth. serious |
4 | UGRD | Foreigner | Malaysia | 2 | 2 | Bachelor of Engineering | M | School of Engineering | Halls | 3 | 4 | 3 | 3 | 7 | 1 | 1.0 | 1.0 | NaN |
5 | UGRD | Foreigner | Viet Nam | 3 | 3 | Bachelor of Engineering | F | School of Engineering | Out of Campus | 4 | 3 | 4 | 5 | 4 | 2 | 2.0 | 2.0 | a mixture of both academic and non-academic |
Understand your variables
# Rename your column names to be more succinct
to_rename = [column for column in df.columns[9:]]
acronyms = ['Volunteer', 'Participate', 'Interest', 'Passion', 'Stress', 'Satisfaction', 'Interaction', 'Events', 'Ideal']
mapping = {key: value for key, value in zip(to_rename,acronyms)}
df = df.rename(columns = mapping)
df.columns
Index(['Career', 'Citizenship', 'Nationality', 'Year since Matriculation',
'Year of Study', 'Primary Programme', 'Gender', 'Department',
'Housing Type', 'Volunteer', 'Participate', 'Interest', 'Passion',
'Stress', 'Satisfaction', 'Interaction', 'Events', 'Ideal'],
dtype='object')
# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])
for i, var in enumerate(df.columns):
variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
variables
Variable | Number of unique values | Values | |
---|---|---|---|
0 | Career | 3 | [UGRD, GRAD, NGRD] |
1 | Citizenship | 3 | [Foreigner, Country Citizen, Permanent Resident] |
2 | Nationality | 31 | [Indonesia, Singapore, Malaysia, Viet Nam, Hon... |
3 | Year since Matriculation | 6 | [2, 1, 3, 4, 5, 6] |
4 | Year of Study | 5 | [2, 1, 3, 4, 5] |
5 | Primary Programme | 68 | [Bachelor of Science, Bachelor of Engineering,... |
6 | Gender | 2 | [F, M] |
7 | Department | 21 | [School of Science, School of Engineering, Sch... |
8 | Housing Type | 4 | [Residences, Out of Campus, Halls, Residential... |
9 | Volunteer | 12 | [0, 3, 4, 2, 1, 5, 10, 6, 8, 7, 9, 11] |
10 | Participate | 6 | [1, 4, 3, 2, 0, 5] |
11 | Interest | 8 | [3, 2, 1, 4, 5, 6, 7, 8] |
12 | Passion | 12 | [1, 3, 5, 2, 4, 7, 6, 8, 10, 9, 0, 11] |
13 | Stress | 10 | [1, 2, 7, 4, 3, 5, 6, 8, 9, 0] |
14 | Satisfaction | 4 | [2, 1, 3, 0] |
15 | Interaction | 4 | [2.0, 1.0, 3.0, 0.0, nan] |
16 | Events | 4 | [2.0, 3.0, 1.0, 4.0, nan] |
17 | Ideal | 2245 | [nan, Friends+CCas+good result, just want ever... |
Identify what variables are worth analyzing further
Create a heatmap to identify correlations
# Generate correlation matrix
corr = df.corr(method='pearson')
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(11, 9)) # Set figure size
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask
sns.heatmap(corr,
mask = mask,
cmap = cmap,
vmax = 1, # Set scale min value
vmin = -1, # Set scale min value
center = 0, # Set scale min value
square = True, # Ensure perfect squares
linewidths = 1.5, # Set linewidth between squares
cbar_kws = {"shrink": .8}, # Set size of color bar
annot = True # Include values within squares
);
plt.xticks(rotation=90) # Rotate x labels
plt.title('Diagonal Correlation Plot', size=20, y=1.05); # Set plot title and position
Answer interesting questions:
Now you get to explore this exciting dataset! Can't think of where to start? No worries we've got you covered. Try your hand at these questions:
- Are international students happier than domestic students?
- How does the amount of events students attend, influence their stress levels?
- Does the type of housing influence stress levels, passion and happiness?
# Start coding
Student Happiness Survey Data
Explore the student happiness survey dataset and publish your findings.