Skip to content
Project: Analyzing Flight Delays and Cancellations
  • AI Chat
  • Code
  • Report
  • A prominent airline company in the Pacific Northwest has accumulated extensive data related to flights and weather patterns and needs to understand the factors influencing the departure delays and cancellations to benefit both airlines and passengers. As the data analyst on the team, you decide to embark on this analytical project.

    The aviation industry is dynamic with various variables impacting flight operations. To ensure the relevance and applicability of your findings, you choose to focus solely on flights from the 'pnwflights2022' datasets available from the ModernDive team exported as CSV files. These datasets provide comprehensive information on flights departing in the first half of 2022 from both of the two major airports in this region: SEA (Seattle-Tacoma International Airport) and PDX (Portland International Airport):

    • flights2022.csv contains information about about each flight including
    VariableDescription
    dep_timeDeparture time (in the format hhmm) whereNA corresponds to a cancelled flight
    dep_delayDeparture delay, in minutes (negative for early)
    originOrigin airport where flight starts (IATA code)
    airlineCarrier/airline name
    destDestination airport where flight lands (IATA code)
    • flights_weather2022.csv contains the same flight information as well as weather conditions such as
    VariableDescription
    visibVisibility (in miles)
    wind_gustWind gust speed (in mph)
    # Import required libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    
    # Start your code here!
    
    flights2022 = pd.read_csv('flights2022.csv')
    
    flights_weather2022 = pd.read_csv('flights_weather2022.csv')
    
    #flights2022.info()
    
    #flights_weather2022.info()
    
    flights2022['route']= flights2022['origin']+'-'+flights2022['dest']
    
    #flights2022.head()
    
    routes_delays_cancels = flights2022.groupby('route').agg(mean_dep_delay=('dep_delay','mean'),total_cancellations=('dep_time',lambda x:x.isna().sum())).reset_index()
    
    #routes_delays_cancels.head(2)
    
    #airlines_delays_cancels = flights2022.groupby('route')['arr_delay'].agg('mean','count')
    
    #airlines_delays_cancels.head(2)
    
    #routes_delays_cancels.sort_values(ascending=False).head(9)
    
    #airlines_delays_cancels.sort_values(ascending=False).head(9
    
    top_routes_by_delay = routes_delays_cancels.sort_values("mean_dep_delay", ascending=False).head(9)
    
    top_routes_by_cancellations = routes_delays_cancels.sort_values("total_cancellations", ascending=False).head(9)
    
    top9_route_cancels_bar, ax = plt.subplots()
    
    ax.bar(top_routes_by_cancellations["route"], top_routes_by_cancellations["total_cancellations"])
    
    ax.set_xlabel("Route")
    ax.set_ylabel("Total Cancellations")
    ax.set_title("Routes with Highest Number of Cancellations")
    
    ax.set_xticklabels(top_routes_by_cancellations["route"], rotation=90)
    
    plt.show()
    
    airlines_delays_cancels = flights2022.groupby("airline").agg(
        mean_dep_delay=("dep_delay", "mean"),
        total_cancellations=("dep_time", lambda x: x.isna().sum())
    ).reset_index()
    
    top_airlines_by_delay = airlines_delays_cancels.sort_values("mean_dep_delay", ascending=False).head(9)
    
    top_airlines_by_cancellations = airlines_delays_cancels.sort_values("total_cancellations", ascending=False).head(9)
    
    top9_airline_delays_bar, ax = plt.subplots()
    ax.bar(top_airlines_by_delay["airline"], top_airlines_by_delay["mean_dep_delay"])
    
    ax.set_xlabel("Airline")
    ax.set_ylabel("Mean Departure Delay")
    ax.set_title("Airlines with Highest Mean Departure Delays")
    
    ax.set_xticklabels(top_airlines_by_delay["airline"], rotation=75)
    
    plt.show()
    
    
    flights_weather2022["group"] = flights_weather2022["wind_gust"].apply(lambda x: ">= 10mph" if x >= 10 else "< 10 mph")
    
    wind_grouped_data = flights_weather2022.groupby(["group", "origin"]).agg(
        mean_dep_delay=("dep_delay", "mean")
    )
    
    wind_grouped_data.head()
    
    wind_response = True