Reducing the number of high fatality accidents
📖 Background
You work for the road safety team within the department of transport and are looking into how they can reduce the number of major incidents. The safety team classes major incidents as fatal accidents involving 3+ casualties. They are trying to learn more about the characteristics of these major incidents so they can brainstorm interventions that could lower the number of deaths. They have asked for your assistance with answering a number of questions.
💾 The data
The reporting department have been collecting data on every accident that is reported. They've included this along with a lookup file for 2020's accidents.
Published by the department for transport. https://data.gov.uk/dataset/road-accidents-safety-data Contains public sector information licensed under the Open Government Licence v3.0.
import pandas as pd
accidents = pd.read_csv(r'./data/accident-data.csv')
#accidents.head()
accidents.info()
#acc_copy['accident_severity'].value_counts()
lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
#lookup.head(20)
lookup.info()
#lookup.describe
#lookup['code/format'].value_counts()
#lookup.isna().sum()
#lookup[lookup['field name'].isin(['accident_severity'])][['code/format','label']]
#lookup[lookup['field name'].isin(['day_of_week'])]
#lookup[lookup['field name'].isin(['day_of_week'])][['code/format','label']]
#lookup[lookup['field name'].isin(['urban_or_rural_area'])]
#lookup[lookup['field name'].isin(['pedestrian_crossing_human_control'])]
#lookup[lookup['field name'].isin(['light_conditions'])]
#lookup[lookup['field name'].isin(['first_road_class'])]
#lookup[lookup['field name'].isin(['junction_control'])]
#lookup[lookup['field name'].isin(['second_road_class'])]
#lookup[lookup['field name'].isin(['pedestrian_crossing_physical_facilities'])]
#lookup[lookup['field name'].isin(['road_type'])]
#lookup[lookup['field name'].isin(['road_surface_conditions'])]
#lookup[lookup['field name'].isin(['carriageway_hazards'])]
#lookup[lookup['field name'].isin(['special_conditions_at_site'])]
#lookup[lookup['field name'].isin(['weather_conditions'])]
#lookup[lookup['field name'].isin(['junction_detail'])]
#print(lookup[lookup['field name'].isin(['second_road_number'])]['label'])
#lookup['field name'].value_counts()
💪 Competition challenge
Create a report that covers the following:
- What time of day and day of the week do most major incidents happen?
- Are there any patterns in the time of day/ day of the week when major incidents occur?
- What characteristics stand out in major incidents compared with other accidents?
- On what areas would you recommend the planning team focus their brainstorming efforts to reduce major incidents?
🧑⚖️ Judging criteria
CATEGORY | WEIGHTING | DETAILS |
---|---|---|
Recommendations | 35% |
|
Storytelling | 30% |
|
Visualizations | 25% |
|
Votes | 10% |
|
✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights.
- Check that all the cells run without error.
⌛️ Time is ticking. Good luck!
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime
#My User Defined Functions
def addLookupField(df,lk_fieldname,lk_newfieldname):
'''Gets corresponding mappings from lookup Dataframe and attach its description as a new lk_ field '''
categ= lookup[lookup['field name'].isin([lk_fieldname])][['code/format','label']]
categ['code/format'] = categ['code/format'].astype(int)
categ['label'] = categ['label'].astype(str)
categ_ind = categ.set_index('code/format')
mapping_lbl = categ_ind['label'].to_dict()
acc_copy[lk_newfieldname] = acc_copy[lk_fieldname].replace(mapping_lbl)
return df
accidents = pd.read_csv(r'./data/accident-data.csv')
lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
#get a safe copy of original's dataframe
acc_copy = accidents.copy(deep=True)
#get all description lookup as new fields
acc_copy = addLookupField(acc_copy,'accident_severity','lk_severity')
acc_copy = addLookupField(acc_copy,'day_of_week','lk_weekday')
acc_copy = addLookupField(acc_copy,'junction_detail','lk_junction_detail')
acc_copy = addLookupField(acc_copy,'weather_conditions','lk_weather_conditions')
acc_copy = addLookupField(acc_copy,'special_conditions_at_site','lk_special_conditions_at_site')
acc_copy = addLookupField(acc_copy,'carriageway_hazards','lk_carriageway_hazards')
acc_copy = addLookupField(acc_copy,'road_surface_conditions','lk_road_surface_conditions')
acc_copy = addLookupField(acc_copy,'road_type','lk_road_type')
acc_copy = addLookupField(acc_copy,'pedestrian_crossing_physical_facilities','lk_pedestrian_crossing_physical_facilities')
acc_copy = addLookupField(acc_copy,'second_road_class','lk_second_road_class')
acc_copy = addLookupField(acc_copy,'junction_control','lk_junction_control')
acc_copy = addLookupField(acc_copy,'first_road_class','lk_first_road_class')
acc_copy = addLookupField(acc_copy,'light_conditions','lk_light_conditions')
acc_copy = addLookupField(acc_copy,'pedestrian_crossing_human_control','lk_pedestrian_crossing_human_control')
acc_copy = addLookupField(acc_copy,'urban_or_rural_area','lk_urban_or_rural_area')
#acc_copy = addLookupField(acc_copy,'speed_limit','lk_speed_limit')
#add to Dataframe all related time fields that are relevant data to analyze further on
acc_copy['etime'] = pd.to_datetime(acc_copy['date']+' '+acc_copy['time'])
acc_copy['htime'] = acc_copy['etime'].dt.strftime('%H')
acc_copy['month'] = acc_copy['etime'].dt.strftime('%B')
acc_copy['nmonth'] = acc_copy['etime'].dt.strftime('%m')
acc_copy['smonth'] = acc_copy['etime'].dt.strftime('%b')
#acc_copy major when 3+ casualties occured
#acc_copy_major = acc_copy[(acc_copy['number_of_casualties']>=3) & (acc_copy['lk_severity']=='Fatal')]
acc_copy_major = acc_copy[(acc_copy['number_of_casualties']>3)]
acc_copy_major['xtype'] = 'major'
#set_acc_ind = set(acc_copy_major['accident_index'])
#inc = acc_copy['accident_index'].isin(set_acc_ind)
#acc_copy_others = acc_copy[~inc]
acc_copy_others = acc_copy[(acc_copy['number_of_casualties']<=3)]
acc_copy_others['xtype'] = 'other'
acc_copy_full= pd.concat([acc_copy_major,acc_copy_others])
Time of day and Day of the week most major incidents happen
Following displays count of those accidents with casualties >=3 by Hour of the day and Day of the week, some patterns can be noticed, as during afternoon hours is when most major incidents happen and also during weekends, specially on Friday, Saturday and Sunday evenings.
Thus could be related to hours when most people move around and go out for fun, moving from one location to another using their vehicles.
Perhaps a good recommendation is to setup transit controls on strategic areas in these days/hours, and locations such as in/outs from the cities or major highway exits or even city entertainment areas, and include driver alcohol testing in these controls.
r = acc_copy_major.groupby(['day_of_week','lk_weekday','htime'],as_index=False)\
.agg({'accident_index':'count'})
acc_major_by_day_hour = pd.crosstab(index=r['lk_weekday'],columns=r['htime'],values=r['accident_index'],\
aggfunc='sum').fillna(0,downcast='infer')
#Heatmap of Time of day and Day of the week most major incidents happen
fig, ax = plt.subplots()
fig.set_figwidth(20.0)
d = sns.heatmap(data=acc_major_by_day_hour,annot=True, fmt="d",cmap="OrRd", cbar=True, \
linewidths=.5,annot_kws={"size": 12})
d.axes.set_title('Time of day and Day of the week most major incidents happen',fontsize=20)
d.set_xlabel("Hour of the Day",fontsize=14)
d.set_ylabel("Day of the Week",fontsize=14)
plt.show(d)
r = acc_copy_major.groupby(['day_of_week','lk_weekday','nmonth','smonth'],as_index=False)\
.agg({'accident_index':'count'}).sort_values(by='nmonth',ascending=False)
acc_major_by_month_day = pd.crosstab(index=[r['nmonth'],r['smonth']],columns=[r['day_of_week'],r['lk_weekday']],values=r['accident_index'],\
aggfunc='sum')
#acc_major_by_month_day
#Heatmap of Time of day and Day of the week most major incidents happen
fig, ax = plt.subplots()
fig.set_figwidth(10.0)
d = sns.heatmap(data=acc_major_by_month_day,annot=True, fmt="d",cmap="YlGnBu", cbar=True, \
linewidths=.5,annot_kws={"size": 12})
d.axes.set_title('Month and Day of the week most major incidents happen',fontsize=20)
d.set_xlabel("Day of the week",fontsize=14)
d.set_ylabel("Month",fontsize=14)
plt.show(d)
Months where most major incidents with 3+ casualties occur
Following is how major accidents happened throughout the year, seems first months of the year are significative when comparing to rest of the year, and during Summer till mid Autumn seems to be very high season of major incidents too.
#Barplot on Months where most Fatal casualties do occurr
acc_on_month = acc_copy_major.groupby(['smonth','nmonth','month'],as_index=False)\
.agg({'accident_index':'count'}).sort_values(by='nmonth',ascending=True)
#acc_on_month
e = sns.barplot(data=acc_on_month,x='smonth',y='accident_index',palette='autumn')
e.axes.set_title('Major incidents by Month',fontsize=14)
e.set_ylabel('No. of major incidents 3+ casualties',fontsize=12)
e.set_xlabel('Month',fontsize=12)
plt.show()