Skip to content
Competition - high fatality accidents-MRC
0
  • AI Chat
  • Code
  • Report
  • Reducing the number of high fatality accidents

    📖 Background

    You work for the road safety team within the department of transport and are looking into how they can reduce the number of major incidents. The safety team classes major incidents as fatal accidents involving 3+ casualties. They are trying to learn more about the characteristics of these major incidents so they can brainstorm interventions that could lower the number of deaths. They have asked for your assistance with answering a number of questions.

    💾 The data

    The reporting department have been collecting data on every accident that is reported. They've included this along with a lookup file for 2020's accidents.

    Published by the department for transport. https://data.gov.uk/dataset/road-accidents-safety-data Contains public sector information licensed under the Open Government Licence v3.0.

    import pandas as pd
    accidents = pd.read_csv(r'./data/accident-data.csv')
    #accidents.head()
    accidents.info()
    
    #acc_copy['accident_severity'].value_counts()
    lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
    #lookup.head(20)
    lookup.info()
    #lookup.describe
    
    #lookup['code/format'].value_counts()
    #lookup.isna().sum()
    
    #lookup[lookup['field name'].isin(['accident_severity'])][['code/format','label']]
    #lookup[lookup['field name'].isin(['day_of_week'])]
    #lookup[lookup['field name'].isin(['day_of_week'])][['code/format','label']]
    #lookup[lookup['field name'].isin(['urban_or_rural_area'])]
    #lookup[lookup['field name'].isin(['pedestrian_crossing_human_control'])]
    #lookup[lookup['field name'].isin(['light_conditions'])]
    #lookup[lookup['field name'].isin(['first_road_class'])]
    #lookup[lookup['field name'].isin(['junction_control'])]
    #lookup[lookup['field name'].isin(['second_road_class'])]
    #lookup[lookup['field name'].isin(['pedestrian_crossing_physical_facilities'])]
    #lookup[lookup['field name'].isin(['road_type'])]
    #lookup[lookup['field name'].isin(['road_surface_conditions'])]
    #lookup[lookup['field name'].isin(['carriageway_hazards'])]
    #lookup[lookup['field name'].isin(['special_conditions_at_site'])]
    #lookup[lookup['field name'].isin(['weather_conditions'])]
    #lookup[lookup['field name'].isin(['junction_detail'])]
    
    #print(lookup[lookup['field name'].isin(['second_road_number'])]['label'])
    
    #lookup['field name'].value_counts()

    💪 Competition challenge

    Create a report that covers the following:

    1. What time of day and day of the week do most major incidents happen?
    2. Are there any patterns in the time of day/ day of the week when major incidents occur?
    3. What characteristics stand out in major incidents compared with other accidents?
    4. On what areas would you recommend the planning team focus their brainstorming efforts to reduce major incidents?

    🧑‍⚖️ Judging criteria

    CATEGORYWEIGHTINGDETAILS
    Recommendations35%
    • Clarity of recommendations - how clear and well presented the recommendation is.
    • Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid?
    • Number of relevant insights found for the target audience.
    Storytelling30%
    • How well the data and insights are connected to the recommendation.
    • How the narrative and whole report connects together.
    • Balancing making the report in depth enough but also concise.
    Visualizations25%
    • Appropriateness of visualization used.
    • Clarity of insight from visualization.
    Votes10%
    • Up voting - most upvoted entries get the most points.

    ✅ Checklist before publishing into the competition

    • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
    • Remove redundant cells like the judging criteria so the workbook is focused on your story.
    • Make sure the workbook reads well and explains how you found your insights.
    • Check that all the cells run without error.

    ⌛️ Time is ticking. Good luck!

    import pandas as pd
    import numpy as np
    import seaborn as sns  
    import matplotlib.pyplot as plt
    from datetime import datetime
    
    
    #My User Defined Functions
    def addLookupField(df,lk_fieldname,lk_newfieldname):
        '''Gets corresponding mappings from lookup Dataframe and attach its description as a new lk_ field '''
        categ= lookup[lookup['field name'].isin([lk_fieldname])][['code/format','label']]
        categ['code/format'] = categ['code/format'].astype(int)
        categ['label'] = categ['label'].astype(str)
    
        categ_ind = categ.set_index('code/format')
        mapping_lbl = categ_ind['label'].to_dict()
    
        acc_copy[lk_newfieldname] = acc_copy[lk_fieldname].replace(mapping_lbl)
        return df
    
    
    
    accidents = pd.read_csv(r'./data/accident-data.csv')
    lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
    
    #get a safe copy of original's dataframe
    acc_copy = accidents.copy(deep=True)
    
    #get all description lookup as new fields
    acc_copy = addLookupField(acc_copy,'accident_severity','lk_severity')
    acc_copy = addLookupField(acc_copy,'day_of_week','lk_weekday')
    acc_copy = addLookupField(acc_copy,'junction_detail','lk_junction_detail')
    acc_copy = addLookupField(acc_copy,'weather_conditions','lk_weather_conditions')
    acc_copy = addLookupField(acc_copy,'special_conditions_at_site','lk_special_conditions_at_site')
    acc_copy = addLookupField(acc_copy,'carriageway_hazards','lk_carriageway_hazards')
    acc_copy = addLookupField(acc_copy,'road_surface_conditions','lk_road_surface_conditions')
    acc_copy = addLookupField(acc_copy,'road_type','lk_road_type')
    acc_copy = addLookupField(acc_copy,'pedestrian_crossing_physical_facilities','lk_pedestrian_crossing_physical_facilities')
    acc_copy = addLookupField(acc_copy,'second_road_class','lk_second_road_class')
    acc_copy = addLookupField(acc_copy,'junction_control','lk_junction_control')
    acc_copy = addLookupField(acc_copy,'first_road_class','lk_first_road_class')
    acc_copy = addLookupField(acc_copy,'light_conditions','lk_light_conditions')
    acc_copy = addLookupField(acc_copy,'pedestrian_crossing_human_control','lk_pedestrian_crossing_human_control')
    acc_copy = addLookupField(acc_copy,'urban_or_rural_area','lk_urban_or_rural_area')
    #acc_copy = addLookupField(acc_copy,'speed_limit','lk_speed_limit')
    
    #add to Dataframe all related time fields that are relevant data to analyze further on
    acc_copy['etime'] = pd.to_datetime(acc_copy['date']+' '+acc_copy['time'])
    acc_copy['htime'] = acc_copy['etime'].dt.strftime('%H')
    acc_copy['month'] = acc_copy['etime'].dt.strftime('%B')
    acc_copy['nmonth'] = acc_copy['etime'].dt.strftime('%m')
    acc_copy['smonth'] = acc_copy['etime'].dt.strftime('%b')
    
    #acc_copy major when 3+ casualties occured
    #acc_copy_major = acc_copy[(acc_copy['number_of_casualties']>=3) & (acc_copy['lk_severity']=='Fatal')]
    acc_copy_major = acc_copy[(acc_copy['number_of_casualties']>3)]
    
    acc_copy_major['xtype'] = 'major'
    
    #set_acc_ind = set(acc_copy_major['accident_index'])
    #inc = acc_copy['accident_index'].isin(set_acc_ind)
    #acc_copy_others = acc_copy[~inc]
    acc_copy_others = acc_copy[(acc_copy['number_of_casualties']<=3)]
    
    acc_copy_others['xtype'] = 'other'
    
    acc_copy_full= pd.concat([acc_copy_major,acc_copy_others])
    

    Time of day and Day of the week most major incidents happen

    Following displays count of those accidents with casualties >=3 by Hour of the day and Day of the week, some patterns can be noticed, as during afternoon hours is when most major incidents happen and also during weekends, specially on Friday, Saturday and Sunday evenings.
    Thus could be related to hours when most people move around and go out for fun, moving from one location to another using their vehicles.
    Perhaps a good recommendation is to setup transit controls on strategic areas in these days/hours, and locations such as in/outs from the cities or major highway exits or even city entertainment areas, and include driver alcohol testing in these controls.

    r = acc_copy_major.groupby(['day_of_week','lk_weekday','htime'],as_index=False)\
    .agg({'accident_index':'count'})
    
    acc_major_by_day_hour = pd.crosstab(index=r['lk_weekday'],columns=r['htime'],values=r['accident_index'],\
                                        aggfunc='sum').fillna(0,downcast='infer')
    
    #Heatmap of Time of day and Day of the week most major incidents happen
    fig, ax = plt.subplots()
    fig.set_figwidth(20.0)
    d = sns.heatmap(data=acc_major_by_day_hour,annot=True, fmt="d",cmap="OrRd", cbar=True, \
                linewidths=.5,annot_kws={"size": 12})
    d.axes.set_title('Time of day and Day of the week most major incidents happen',fontsize=20)
    d.set_xlabel("Hour of the Day",fontsize=14)
    d.set_ylabel("Day of the Week",fontsize=14)
    plt.show(d)
    r = acc_copy_major.groupby(['day_of_week','lk_weekday','nmonth','smonth'],as_index=False)\
    .agg({'accident_index':'count'}).sort_values(by='nmonth',ascending=False)
    
    acc_major_by_month_day = pd.crosstab(index=[r['nmonth'],r['smonth']],columns=[r['day_of_week'],r['lk_weekday']],values=r['accident_index'],\
                                        aggfunc='sum')
    #acc_major_by_month_day
    
    #Heatmap of Time of day and Day of the week most major incidents happen
    fig, ax = plt.subplots()
    fig.set_figwidth(10.0)
    d = sns.heatmap(data=acc_major_by_month_day,annot=True, fmt="d",cmap="YlGnBu", cbar=True, \
                linewidths=.5,annot_kws={"size": 12})
    d.axes.set_title('Month and Day of the week most major incidents happen',fontsize=20)
    d.set_xlabel("Day of the week",fontsize=14)
    d.set_ylabel("Month",fontsize=14)
    plt.show(d)

    Months where most major incidents with 3+ casualties occur

    Following is how major accidents happened throughout the year, seems first months of the year are significative when comparing to rest of the year, and during Summer till mid Autumn seems to be very high season of major incidents too.

    #Barplot on Months where most Fatal casualties do occurr
    acc_on_month = acc_copy_major.groupby(['smonth','nmonth','month'],as_index=False)\
    .agg({'accident_index':'count'}).sort_values(by='nmonth',ascending=True)
    #acc_on_month
    
    e = sns.barplot(data=acc_on_month,x='smonth',y='accident_index',palette='autumn')
    e.axes.set_title('Major incidents by Month',fontsize=14)
    e.set_ylabel('No. of major incidents 3+ casualties',fontsize=12)
    e.set_xlabel('Month',fontsize=12)
    plt.show()