A foremost aviation industry player with a significant presence in New York City has launched an in-depth data analysis project focused on identifying trends in flight durations in air travel. This initiative aims to delve into a wealth of data related to flight schedules and operational patterns, with the objective of optimizing flight times and enhancing the overall travel experience for passengers. As the head data analyst, you have access to rich datasets, sourced from the 'nycflights2022' collection produced by the ModernDive team. These datasets include records of flights departing from major New York City airports, including JFK (John F. Kennedy International Airport), LGA (LaGuardia Airport), and EWR (Newark Liberty International Airport), during the second half of 2022. They offer a comprehensive view of flight operations, covering various aspects such as departure and arrival times, flight paths, and airline specifics:
flights2022-h2.csv
contains information about each flight including
Variable | Description |
---|---|
carrier | Airline carrier code |
origin | Origin airport (IATA code) |
dest | Destination airport (IATA code) |
air_time | Duration of the flight in air, in minutes |
airlines.csv
contains information about each airline:
Variable | Description |
---|---|
carrier | Airline carrier code |
name | Full name of the airline |
airports.csv
provides details of airports:
Variable | Description |
---|---|
faa | FAA code of the airport |
name | Full name of the airport |
Answer the following questions
-
Which airline and airport pair receives the most flights from NYC and what is the average duration of that flight? Save your answer as a data frame called frequent with one row and a minimum of two columns:
airline_name
,airport_name
. -
Find the airport that has the longest average flight duration (in hours) from NYC. What is the name of this airport? Save your answer as a data frame called longest with one row and a minimum of two columns:
airline_name
,airport_name
. -
Which airport is the least frequented destination for flights departing from JFK? Save your answer as a character string called least.
# Import required packages
library(dplyr)
library(readr)
# Load the data
flights <- read_csv("flights2022-h2.csv")
airlines <- read_csv("airlines.csv")
airports <- read_csv("airports.csv")
# Start your code here!
head(flights)
head(airlines)
head(airports)
# Question 1
frequent <- flights %>%
group_by(carrier, dest) %>%
summarise(
flight_count = n(),
avg_duration = round(mean(air_time, na.rm = TRUE)/60,2),
.groups = "drop"
) %>%
arrange(desc(flight_count)) %>%
slice(1) %>%
inner_join(airlines, by = c("carrier" = "carrier")) %>%
inner_join(airports, by = c("dest" = "faa")) %>%
select(airline_name = name.x, airport_name = name.y, flight_count, avg_duration)
frequent
#Question 2
longest <- flights %>%
filter(!is.na(air_time)) %>%
group_by(dest, carrier) %>%
summarise(
flight_count = n(),
avg_duration = mean(air_time)/60, .groups = "drop") %>%
arrange(desc(avg_duration)) %>%
slice(1) %>%
left_join(airlines, by = c("carrier" = "carrier")) %>%
left_join(airports, by = c("dest" = "faa")) %>%
select(airline_name = name.x, airport_name = name.y, flight_count, avg_duration)
longest
#Question 3
least <- flights %>%
filter(origin == 'JFK') %>%
group_by(dest) %>%
summarise(flight_count = n(), .groups = 'drop') %>%
arrange(flight_count) %>%
slice(1) %>%
left_join(airports, by = c("dest" = "faa")) %>%
select(airport_name = name)
least
least <- "Eagle County Regional Airport"