Toronto, Ontario 🌆. The Queen City. The 6ix.
Known for its vibrant arts scene, diverse culture, stunning skyline, and bustling neighborhoods, Toronto is a city that never sleeps. However, as with any major urban center, it faces its share of challenges. One growing concern for Torontonians is the rising number of bicycle thefts.
You have been invited to assist the Toronto Police Service by analyzing and visualizing data to uncover patterns in theft activity. Your findings and visual insights will provide crucial information that can help allocate resources more effectively and develop strategies to combat bike thefts, ensuring a safer city for all cyclists.
The Data
The dataset used for analyzing bike thefts is titled Cleaned_Bicycle_Thefts_Open_Data.csv in the data folder. This dataset contains essential information regarding bicycle theft incidents in a given city. Below are the details of each column in the dataset:
| Column | Description |
|---|---|
date | The date when the bike theft occurred, formatted as YYYY/MM/DD. |
quarter | A quarter represents one-fourth (1/4) of a year, equating to three months. |
day_of_week | The day of the week when the theft took place (e.g., Monday, Tuesday). |
neighborhood | The neighborhood where the theft occurred, based on the city's 140 social planning neighborhoods. |
bike_cost | The reported cost of the stolen bike, specified in the local currency. |
location | The specific location type of the theft, such as Residential Structures, Commercial Areas, Public Spaces, etc. |
long | The longitude of the center of the neighborhood. |
lat | The latitude of the center of the neighborhood. |
This dataset provides a comprehensive view of bike thefts, including when and where they occur, the financial impact, and other spatial and temporal factors. By analyzing this data, you can gain valuable insights into patterns and trends, which can inform strategies to mitigate bike thefts and enhance urban planning.
## Load tidyverse package
library(tidyverse)
library(lubridate)
## Read `bike_data`
bike_data <- read_csv("data/Cleaned_Bicycle_Thefts_Open_Data.csv")
## Take a glance of the `bike_data`
head(bike_data)- Which quarter, i.e., "Q1", "Q2", "Q3" and "Q4", has the highest and lowest number of stolen bikes? Store your findings as string variables high and low.
Since Quarters are stored as date in Jan, Apr, Jul, Oct I recode this to be Q1, Q2, Q3 and Q4
# Start coding here
bike_data<- bike_data %>% mutate(quarter_month= month(quarter)) %>%
mutate(quarter_q= case_when(quarter_month == 1 ~ "Q1",
quarter_month == 4 ~ "Q2",
quarter_month == 7 ~ "Q3",
quarter_month == 10 ~ "Q4"))
highest_theft<- bike_data %>% group_by(quarter_q) %>% count(quarter_q) %>% arrange(desc(n))
high<- head(highest_theft$quarter_q, n=1)
high
lowest_theft<- bike_data %>% group_by(quarter_q) %>% count(quarter_q) %>% arrange(n)
low<- head(lowest_theft$quarter_q, n=1)
low#create a time series plot
time_series_data<- bike_data %>% group_by(quarter) %>% count(quarter)
head(time_series_data)
ggplot(time_series_data, aes(x= quarter, y= n)) + geom_line()+ coord_fixed(ratio=0.5) + xlab("Yearly quarters") + ylab("Number of bike theft") +
scale_x_date(date_breaks = "3 months", date_labels = "%Y-%m")+ theme(axis.text.x = element_text(angle = 45, hjust = 1))- What are the most frequent locations (e.g., Residential, Commercial Areas) for bike thefts in Toronto? And what proportion it is (round to one decimal place)? Store your findings as a string variable, location and a numerical variable, percentage.
To do this group by location type and sort descending
location_data<- bike_data %>% group_by(location) %>% count(location) %>% arrange(desc(n))
total_theft<- sum(location_data$n)
location_data$percentage<- round(location_data$n/total_theft, digits=1)
head(location_data)
location<- as.character(head(location_data$location, n=1))
percentage<- as.numeric(head(location_data$percentage, n=1))
location
percentage#create a pie chart
ggplot(location_data, aes(x="", y=percentage, fill=location)) +
geom_col(width = 1) +
coord_polar(theta = "y") +
geom_text(aes(label = percentage), position = position_stack(vjust = 0.5))+ theme_minimal() + ylab("Percentage of theft per location type")+ labs(fill="Location type")- In which region of Toronto is the median value of stolen bikes the highest? Store your findings as a string variable, region (region can be a real region name or a region code, i.e., '1', '2', '3', ...).
Calculate MEDIAN value of stolen bikes per region name (neighborhood) and arrange in descending order
bike_cost_data <- bike_data %>%
group_by(neighborhood) %>%
summarize(median_value = median(bike_cost, na.rm = TRUE))
bike_data_long_lat <- bike_data %>%
select(neighborhood, long, lat) %>%
distinct()
bike_cost_data <- right_join(bike_cost_data, bike_data_long_lat, by = "neighborhood")
region<- bike_cost_data$neighborhood[which.max(bike_cost_data$median_value)]
region
# create plot with label only on the neighborhood with the highest median value
ggplot(bike_cost_data, aes(x=long, y=lat, color=median_value)) +
geom_point(size=4) +
xlab("Longitude") +
ylab("Latitude") +
geom_text(data = subset(bike_cost_data, neighborhood == region), aes(label=neighborhood), vjust=-1)4. What course of action would you recommend to the police station based on your findings? Store your recommendation as a string variable, action.
action<- c("The police should increase surveilance in neighborhood 41 during the 3rd quarter of the year between July and October")