Skip to content
Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities
  • AI Chat
  • Code
  • Report
  • Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities

    📖 BACKGROUND

    We work for the road safety team within the department of transport, and the department is looking into how they can reduce the number of serious accidents.

    It is important to notice that the safety team classes serious accidents as fatal accidents involving 3+ casualties.

    The department is trying to learn more about the characteristics of these accidents, so they can brainstorm interventions that could lower the number of deaths.

    They have asked for our assistance with answering a number of questions.

    💾 THE DATA

    We have two sources of information available:

    • A dataset containing data on every accident that is reported. This dataset has been published by the UK department for transport and it is available here.
    • A lookup file for 2020's accidents. This file contains a description of each of accidents' dataset columns and will be useful to correctly interpret the accidents' data.

    📌 PROBLEM STATEMENT

    Our goal for this project is to create a report that answers the following questions:

    1. What time of day and day of the week do most serious accidents happen?
    2. Are there any patterns in the time of day / day of the week when serious accidents occur?
    3. What characteristics stand out in serious accidents compared with other accidents?
    4. On what areas would you recommend the planning team focus their brainstorming efforts to reduce serious accidents?

    Through data cleaning, analysis and visualization we will answer these questions in order to help our stakeholder to increase road safety by understanding when and how fatal accidents tend to happen and, consequently, take actions to prevent them and save lives.

    📚 LOAD PACKAGES

    Let's start by loading all the necessary Python packages.

    # Import necessary libraries
    import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    import plotly.express as px
    import seaborn as sns 
    from matplotlib import rcParams
    from datetime import time, timedelta, datetime
    from time import mktime
    import plotly.graph_objects as go

    🗓 LOAD DATAFRAMES

    We will then load the available datasets. This is how the first dataframe, about the accidents occurred, looks like.

    # Accidents dataset
    accidents = pd.read_csv(r'./data/accident-data.csv')
    accidents.head()

    While this is the lookup file with the accidents' fields descriptions.

    # Lookup dataset
    lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
    lookup.head()

    ⌛️ EXPLORATORY DATA ANALYSIS

    To identify effective strategies for increasing road safety and reducing the number of serious accidents, it is important to understand the circumstances. Still, before diving deeper into the analysis, the our available data has to be examined and, if necessary, properly cleaned.

    # Quick glimpse at the data
    accidents.info()

    We have a dataframe of 27 columns and more than 91K rows, and almost none of the values is null, except for a few data points in the longitude and latitude columns. Now let's understand if we have duplicates among the accidents. To do this we will use the accidents' unique reference number.

    # Check for duplicates
    dups = accidents[accidents.duplicated(['accident_index'])]
    print(len(dups))

    There are no duplicates. For this reason, we cay say that there were a total of 91199 recorded road accidents in the UK, but now the question is: what period does our dataset refer to? Let's quickly explore the accident_year column.

    # Look into accidents' year
    accidents.accident_year.value_counts()

    This means that all accidents in our dataset happened during the year 2020.

    Now, we know that the transport department wants to focus on serious accidents. And we also know that a serious accident is considered a fatal accident involving 3+ casualties. Therefore, for the extent of this analysis, we will focus our work on this specific subset.

    # Select only accidents with 3 or more casualties involved
    serious = accidents.loc[accidents['number_of_casualties'] >= 3]
    serious.head()
    
    # How many serious accidents in 2020?
    print(f'In 2020, there were a total of {serious.shape[0]} serious accidents in the UK.')