Exploratory Data Analysis using Airbnb Data
Hello, everyone. My name is Ariel and thank you for taking the time to look at my exploratory data analysis on the Airbnb listing in New York City. Below, you will see an analysis aimed at uncovering some trends within the attached Airbnb dataset.
About the Data
Within this code chuck, I imported the Airbnb dataset and stored it as ‘listings_in_NYC’. Also, using the glimpse function, we noticed that this dataset has over 48,000 rows and 16 columns. In addition, most of the variable in this dataset are classified as a double, while 5 variables are characters data types.
library(tidyverse)
listings_in_NYC <- readr::read_csv('data/AB_NYC_2019.csv')
glimpse(listings_in_NYC)
Lollipop Chart
The code chuck below was created to find the total number of Airbnb listing in NYC. Instead of simply counting the total number of rows in the Airbnb dataset, I decided to group the data by neighborhood group (also referred to as Boroughs). According to the lollipop chart, the Boroughs with the most listing was Manhattan with approximately 21661 listings. After viewing the chart, I being to wonder how the different room types are distributed across the city? Upon further inspections, nearly 61 percent of the listings in Manhattan were listings that allowed a person to rent the entire home or apartment. In addition, almost 37 percent of the listing in Manhattan were for a private room, while 2 percent of the total listing were for a shared room. In 2019, the neighborhood group with the least amounts of listings was Staten Island with 373 listings. A little under 3 percent of the listings in Staten Island are shared rooms, while 47 percent of the listings were for renting an entire apartment or house. Half of Staten Island Airbnb listings in 2019 allowed the consumers to rent a private room.
listings_in_NYC %>%
group_by(neighbourhood_group) %>%
summarize(n = n()) %>%
arrange((n)) %>%
mutate(neighbourhood_group=factor(neighbourhood_group,levels=neighbourhood_group)) %>%
ggplot(aes(x=neighbourhood_group, y=n)) +
geom_segment( aes(xend=neighbourhood_group, yend=0)) +
geom_point( size=2, color="Dark Green") +
labs(
title = 'Number of Airbnb listing in the Boroughs of New York City',
subtitle = 'Listings in 2019',
caption = 'Source: NYC Airbnb Listings (2019)',
x = 'New York City Borough',
y = 'Number of Airbnb listings',
fill = ''
) +
theme_classic()+
coord_flip()
Density plot with outliers
The density plot below is an attempt to explore the price distribution of NYC Airbnb listing. When looking at the density plot, one will notice how the distribution is skewed. Therefore, it is safe to assume that a few outliers within the datasets are drastically impacting the way the data is displayed. The most expensive listings are the Furnished room in Astoria apartment (located in Astoria Queens), the “Luxury 1 bedroom apt. -stunning Manhattan views” (located in Greenpoint Brooklyn), and the “1-BR Lincoln Center” listing (within the Upper West Side of Manhattan). All the aforementioned listings cost 10000 dollars a night. On the contrary, the cheapest Airbnb listing run for 10 dollars a night. These more affordable listings are located all throughout NYC.
listings_in_NYC %>%
ggplot(aes(x=price)) +
geom_density(fill="dark green", color="black", alpha=0.8) +
labs(
title = 'Price distribution of NYC Airbnb listing',
subtitle = 'NYC listings in 2019: All outliers included',
caption = 'Source: NYC Airbnb Listings (2019)',
x = 'United States Dollar',
y = 'Distribution',
fill = ''
) +
theme_classic()
Density plot without outliers
In order to have a more normally distributed density plot, the third chart title “Filtered Airbnb price distribution in New York City” removed all outliers. As you can see, once the outliers were removed, the new density plot shows that most of the listings most of the 2019 Airbnb listings in NYC were renting for under 200 dollars a night.
iqr <- quantile(listings_in_NYC$price, 0.75) - quantile(listings_in_NYC$price, 0.25)
lower_threshold <- quantile(listings_in_NYC$price, 0.25) - 1.5 * iqr
upper__threshold <- quantile(listings_in_NYC$price, 0.75) + 1.5 * iqr
listings_in_NYC %>%
filter(!(price < lower_threshold | price > upper__threshold)) %>%
ggplot(aes(x=price)) +
geom_density(fill="dark green", color="black", alpha=0.8) +
labs(
title = 'Filtered Airbnb price distribution in New York City ',
subtitle = 'NYC listings in 2019: All outliers removed',
caption = 'Source: NYC Airbnb Listings (2019)',
x = 'United States Dollar',
y = 'Distribution',
fill = ''
) +
theme_classic()
Box plot
The histogram below uses the same filtered data as the density plot above. In 2019, Manhattan had the highest median Airbnb price at 135 dollars a night, followed by Brooklyn at 90 dollars a night. In addition, the median listing price in Queens and Staten Island is 75 dollars and 74 dollars respectively. The lowest median Airbnb listings in NYC belongs to the Bronx at 65 dollars a night.
listings_in_NYC %>%
filter(!(price < lower_threshold | price > upper__threshold)) %>%
select(price, neighbourhood_group) %>%
group_by(neighbourhood_group) %>%
ggplot(aes(x = neighbourhood_group, y=price)) +
geom_boxplot(col="#006400") +
labs(
title = 'Distribution of Airbnb prices by Boroughs of New York City',
subtitle = 'Listings in 2019',
caption = 'Source: NYC Airbnb Listings (2019): All outliers removed',
x = 'New York City Boroughs',
y = 'United States Dollar',
fill = ''
) +
theme_classic()
Concluding remarks:
This exploratory data analysis was set to find trends within the Airbnb listing in New York City, during the year 2019. The data set contained over 48,000 listings and 16 different variables. We learned that out of all of the boroughs in the NYC, Manhattan had the most listing at 21661. With the help of density plots, we learned that a few outliers impact the overall the spread of the price variable causing the data to be skewed. However, once outliers were removed, we learned that most NYC Airbnb listing rent for 200 dollars or less during 2019. Lastly, we also learned that median price of a listing is not the same across the 5 boroughs. The Bronx usually has cheaper listing with a median price of 65 dollars a night. Thank you for taking the time to review my analysis.