Used R Studio to analyze Federal Aviation Administration (FAA) and Bureau of Transportation Statistics (BTS) data.
Click the dropdown for each section to view code.
In order to assess the “winners” and “losers” among United States airports, several important criteria were evaluated. Total enplanement, total airport operations, and air carrier operations were used because together these three metrics can help accurately determine an airport’s capacity and ability to handle air traffic. Total enplanement is the total number of passengers boarding aircrafts at the airport. Total airport operations encompass all the processes and activities that ensure an airport is functional. Finally, Air carrier operations considers the level of airline activity at an airport. Several plots and visualizations analyzing these variables will be used in the following section to identify the winning and losing airports.
plot(filtered_large_hubs$T_ENPL, filtered_large_hubs$T_AOPS,
main = "Plot of Total Airport Operations vs Total Enplanements (Large Hubs)",
xlab = "Total Enplanements",
ylab = "Total Airport Operations",
pch = 19)
#zoomed in plots
plot1 <- plot(filtered_large_hubs$T_ENPL, filtered_large_hubs$T_AOPS,
main = "Zoomed Plot of Loser",
xlab = "Total Enplanements",
ylab = "Total Airport Operations",
pch = 19,
cex = 1.5,
xlim = c(2700000,15000000),
ylim = c(100000,400000))
nashville <- filtered_large_hubs[filtered_large_hubs$APORT_NAME %in% c("NASHVILLE INTL"), ]
text(nashville$T_ENPL, nashville$T_AOPS,
labels = nashville$APORT_NAME,
pos = 1,
cex = 0.7,
col = "red")
plot2 <- plot(filtered_large_hubs$T_ENPL, filtered_large_hubs$T_AOPS,
main = "Zoomed Plot of Winner",
xlab = "Total Enplanements",
ylab = "Total Airport Operations",
pch = 19,
cex = 1.5,
xlim = c(20000000,68000000),
ylim = c(500000,1000000))
atlanta <- filtered_large_hubs[filtered_large_hubs$APORT_NAME %in% c("HARTSFIELD - JACKSON ATLANTA INTL"), ]
text(atlanta$T_ENPL, atlanta$T_AOPS,
labels = atlanta$APORT_NAME,
pos = 3,
cex = 0.7,
col = "red")
Figure 1: Total Airport Operations vs Total Enplanements
Figure 2: Zoomed in , showing the 'losers' and the 'winners'
The scatter plot (figure 1) shows total airport operations versus total enplanements, revealing that the more flights at an airport, the more passenger traffic, in general. The airports positioned higher on the plot can be seen as “winners” as they are likely generating more revenue from passengers and airlines. Conversely, the airports appearing lower on the plot can be interpreted as “losers” since they likely generate less revenue due to fewer flights and passengers. Zoomed-in versions (figure 2) of the original scatter plot reveal Nashville as the “loser” and Atlanta as the “winner” among large hub airports.
large_highest_ac <- filtered_large_hubs %>%
filter(AC > 15000000) %>%
arrange(AC)
bar_heights <- barplot(large_highest_ac$AC,
main = "Airports by Highest AC Values (Large Hubs)",
xlab = "Air Carrier Operations Value",
ylab = "Airports",
col = "blue",
border = "white",
horiz = TRUE,
las = 1,
cex.names = 0.6,
names.arg = NULL)
text(x = large_highest_ac$AC,
y = bar_heights,
labels = large_highest_ac$APORT_NAME,
col = "white",
pos = 2,
cex = 0.6)
Figure 3: Airports by highest air carrier values (large hubs)
Not only does the scatter plot reveal that Hartsfield–Jackson Atlanta International Airport has the highest total airport operations and total enplanements, but it also has the highest air carrier operations value (figure 3). This reflects the high volume of commercial airline activity that passes through the airport each day. This metric indicates that Atlanta handles the most commercial airline flights daily, surpassing its competitors and further demonstrating that it is a 'winner' in terms of the economics and efficiency of U.S. airports. Major hubs such as Los Angeles (LAX), Dallas-Fort Worth (DFW), and John F. Kennedy (JFK) also report strong total airport operations values, consistent with their role as international flight hubs.
large_highest_ac <- filtered_large_hubs %>%
filter(AC > 15000000) %>%
arrange(AC)
bar_heights <- barplot(large_highest_ac$AC,
main = "Airports by Highest AC Values (Large Hubs)",
xlab = "Air Carrier Operations Value",
ylab = "Airports",
col = "blue",
border = "white",
horiz = TRUE,
las = 1,
cex.names = 0.6,
names.arg = NULL)
text(x = large_highest_ac$AC,
y = bar_heights,
labels = large_highest_ac$APORT_NAME,
col = "white",
pos = 2,
cex = 0.6)
Figure 4: Airports by lowest air carrier values (large hubs)
Alternatively, Nashville International Airport (BNA) shows the lowest Air Carrier Operations value among the large hub category (figure 4). Airports such as Austin-Bergstrom (AUS) and Ronald Reagan Washington National Airport (DCA) also fall on the lower end of the spectrum. These airports tend to serve more domestic than international flights, resulting in lower total operations. This reveals that in addition to having low passenger numbers at its airport, Nashville also handles fewer commercial flights and thus generates less revenue overall. This confirms that Nashville should be considered a "loser." Increased investment in terminal capacity and new routes could allow Nashville to move further up the large-hub air carrier value ranking.
delay_summary <- data.frame(
Year = rep(c(2013, 2023), each = 4),
Airport = rep(c("ATL", "BNA", "PVD", "CHA"), times = 2),
Departure_Delay = c(
mean(atlanta_delayed_departures$DEP_DELAY_NEW, na.rm = TRUE),
mean(nashville_delayed_departures_2013$DEP_DELAY_NEW, na.rm = TRUE),
mean(pvd_delayed_departures_2013$DEP_DELAY_NEW, na.rm = TRUE),
mean(lovell_delayed_departures_2013$DEP_DELAY_NEW, na.rm = TRUE),
mean(atlanta_delayed_departures_2023$DEP_DELAY_NEW, na.rm = TRUE),
mean(nashville_delayed_departures_2023$DEP_DELAY_NEW, na.rm = TRUE),
mean(pvd_delayed_departures_2023$DEP_DELAY_NEW, na.rm = TRUE),
mean(lovell_delayed_departures_2023$DEP_DELAY_NEW, na.rm = TRUE)
),
Arrival_Delay = c(
mean(atlanta_delayed_arrivals$ARR_DELAY_NEW, na.rm = TRUE),
mean(nashville_delayed_arrivals_2013$ARR_DELAY_NEW, na.rm = TRUE),
mean(pvd_delayed_arrivals_2013$ARR_DELAY_NEW, na.rm = TRUE),
mean(lovell_delayed_arrivals_2013$ARR_DELAY_NEW, na.rm = TRUE),
mean(atlanta_delayed_arrivals_2023$ARR_DELAY_NEW, na.rm = TRUE),
mean(nashville_delayed_arrivals_2023$ARR_DELAY_NEW, na.rm = TRUE),
mean(pvd_delayed_arrivals_2023$ARR_DELAY_NEW, na.rm = TRUE),
mean(lovell_delayed_arrivals_2023$ARR_DELAY_NEW, na.rm = TRUE)
)
)
delay_summary_filtered <- delay_summary %>%
filter(Year %in% c(2013, 2023)) %>%
mutate(Airport = case_when(
Airport == "ATL" ~ "Atlanta (ATL) (Large Hub Winner)",
Airport == "BNA" ~ "Nashville (BNA) (Large Hub Loser)",
Airport == "PVD" ~ "Providence (PVD) (Small Hub Winner)", # Replaced LGB with PVD
Airport == "CHA" ~ "Lovell Field (CHA) (Small Hub Loser)",
TRUE ~ Airport
))
#Arrivals and departures delay
delay_summary_type <- delay_summary %>%
pivot_longer(cols = c("Departure_Delay", "Arrival_Delay"),
names_to = "Delay_Type",
values_to = "Delay")
delay_summary_type <- delay_summary_type %>%
mutate(Delay_Type = case_when(
Delay_Type == "Departure_Delay" ~ "Departure Delay",
Delay_Type == "Arrival_Delay" ~ "Arrival Delay",
TRUE ~ Delay_Type
))
ggplot(delay_summary_type, aes(x = factor(Year), y = Delay, color = Airport,
linetype = Delay_Type,
group = interaction(Airport, Delay_Type))) +
geom_line(size = 1) +
geom_point(size = 3) +
labs(title = "Average Departure & Arrival Delays (2013 vs. 2023)",
x = "Year",
y = "Average Delay (minutes)",
linetype = "Delay Type",
color = "Airport Name") +
scale_linetype_manual(values = c("solid", "dashed")) +
guides(linetype = guide_legend(order = 1), color = guide_legend(order = 2)) +
theme_minimal()
Figure 5: Delay times for both departures and arrivals
Delay times for both departures and arrivals increased across all airports between 2013 and 2023 (figure 5). Atlanta was identified as the large hub winner, and Providence the small hub winner. The departure delays for Atlanta increased sharply in 2023 compared to 2013 (34 minutes on average in 2013 versus 46.5 minutes in 2023), whereas the arrival delays for Atlanta increased more gradually. Delays in both cases may be attributed to the capacity issues that Atlanta has been facing for years. For the small hub winner, Providence, departure delay times spiked while arrival delay times increased gradually. Departure delays increased from 25 minutes to 49 minutes, and arrival delays increased from 29 minutes to 34 minutes. Part of the cause for these increasing delays in Providence may be the deteriorating state of Runway 16/34, which is currently undergoing rehabilitation.
The small and large hub losers identified did not experience notably worse service return times between 2013 and 2023. In fact, two of the three major delay time spikes came from the winner hubs. Only Chattanooga, TN (small hub) saw a major delay spike, with departure delays increasing from 36 minutes on average in 2013 to 49 minutes on average in 2023. Nashville, TN, the large hub loser, saw gradual increases in both departure and arrival delays, with departure delays increasing from 28 minutes to 34 minutes and arrival delays increasing from 30 minutes to 38 minutes. Both Tennessee airports may see heavily increased volume in the coming years as the influx of data center business floods into that region.
Group assignment completed by Chloe Robinson, Ryan Swett, and Joshua Grossman
Data sources:
Federal Aviation Administration (FAA). Terminal Area Forecast (TAF) System. Retrieved from http://aspm.faa.gov/main/taf.asp
U.S. Department of Transportation, Bureau of Transportation Statistics (BTS) – Airport Enplanement Data.