Skip to content

Toronto, Ontario 🌆. The Queen City. The 6ix.

Known for its vibrant arts scene, diverse culture, stunning skyline, and bustling neighborhoods, Toronto is a city that never sleeps. However, as with any major urban center, it faces its share of challenges. One growing concern for Torontonians is the rising number of bicycle thefts.

You have been invited to assist the Toronto Police Service by analyzing and visualizing data to uncover patterns in theft activity. Your findings and visual insights will provide crucial information that can help allocate resources more effectively and develop strategies to combat bike thefts, ensuring a safer city for all cyclists.

The Data

The dataset used for analyzing bike thefts is titled Cleaned_Bicycle_Thefts_Open_Data.csv in the data folder. This dataset contains essential information regarding bicycle theft incidents in a given city. Below are the details of each column in the dataset:

ColumnDescription
dateThe date when the bike theft occurred, formatted as YYYY/MM/DD.
quarterA quarter represents one-fourth (1/4) of a year, equating to three months.
day_of_weekThe day of the week when the theft took place (e.g., Monday, Tuesday).
neighborhoodThe neighborhood where the theft occurred, based on the city's 140 social planning neighborhoods.
bike_costThe reported cost of the stolen bike, specified in the local currency.
locationThe specific location type of the theft, such as Residential Structures, Commercial Areas, Public Spaces, etc.
longThe longitude of the center of the neighborhood.
latThe latitude of the center of the neighborhood.

This dataset provides a comprehensive view of bike thefts, including when and where they occur, the financial impact, and other spatial and temporal factors. By analyzing this data, you can gain valuable insights into patterns and trends, which can inform strategies to mitigate bike thefts and enhance urban planning.

## Load tidyverse package
library(tidyverse)
library(lubridate)
## Read `bike_data`
bike_data <- read_csv("data/Cleaned_Bicycle_Thefts_Open_Data.csv")
## Take a glance of the `bike_data`
head(bike_data)
  1. Which quarter, i.e., "Q1", "Q2", "Q3" and "Q4", has the highest and lowest number of stolen bikes? Store your findings as string variables high and low.

Since Quarters are stored as date in Jan, Apr, Jul, Oct I recode this to be Q1, Q2, Q3 and Q4

# Start coding here
bike_data<- bike_data %>% mutate(quarter_month= month(quarter)) %>% 
							mutate(quarter_q= case_when(quarter_month == 1 ~ "Q1",
													   quarter_month == 4 ~ "Q2",
													   quarter_month == 7 ~ "Q3",
													   quarter_month == 10 ~ "Q4"))

highest_theft<- bike_data %>% group_by(quarter_q) %>% count(quarter_q) %>% arrange(desc(n))
high<- head(highest_theft$quarter_q, n=1)
high


lowest_theft<- bike_data %>% group_by(quarter_q) %>% count(quarter_q) %>% arrange(n)
low<- head(lowest_theft$quarter_q, n=1)
low
#create a time series plot
time_series_data<- bike_data %>% group_by(quarter) %>% count(quarter)
head(time_series_data)

ggplot(time_series_data, aes(x= quarter, y= n)) + geom_line()+ coord_fixed(ratio=0.5) + xlab("Yearly quarters") + ylab("Number of bike theft") +
  scale_x_date(date_breaks = "3 months", date_labels = "%Y-%m")+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
  1. What are the most frequent locations (e.g., Residential, Commercial Areas) for bike thefts in Toronto? And what proportion it is (round to one decimal place)? Store your findings as a string variable, location and a numerical variable, percentage.

To do this group by location type and sort descending

location_data<- bike_data %>% group_by(location) %>% count(location) %>% arrange(desc(n))
total_theft<- sum(location_data$n)
location_data$percentage<- round(location_data$n/total_theft, digits=1)
head(location_data)

location<- as.character(head(location_data$location, n=1))
percentage<- as.numeric(head(location_data$percentage, n=1))

location
percentage
#create a pie chart

ggplot(location_data, aes(x="", y=percentage, fill=location)) + 
  geom_col(width = 1) + 
  coord_polar(theta = "y") + 
  geom_text(aes(label = percentage), position = position_stack(vjust = 0.5))+ theme_minimal() + ylab("Percentage of theft per location type")+ labs(fill="Location type")
  1. In which region of Toronto is the median value of stolen bikes the highest? Store your findings as a string variable, region (region can be a real region name or a region code, i.e., '1', '2', '3', ...).

Calculate MEDIAN value of stolen bikes per region name (neighborhood) and arrange in descending order

bike_cost_data <- bike_data %>% 
  group_by(neighborhood) %>% 
  summarize(median_value = median(bike_cost, na.rm = TRUE)) 

bike_data_long_lat <- bike_data %>% 
  select(neighborhood, long, lat) %>% 
  distinct()

bike_cost_data <- right_join(bike_cost_data, bike_data_long_lat, by = "neighborhood")

region<- bike_cost_data$neighborhood[which.max(bike_cost_data$median_value)]
region
# create plot with label only on the neighborhood with the highest median value

ggplot(bike_cost_data, aes(x=long, y=lat, color=median_value)) + 
  geom_point(size=4) + 
  xlab("Longitude") + 
  ylab("Latitude") + 
  geom_text(data = subset(bike_cost_data, neighborhood == region), aes(label=neighborhood), vjust=-1)

4. What course of action would you recommend to the police station based on your findings? Store your recommendation as a string variable, action.

action<- c("The police should increase surveilance in neighborhood 41 during the 3rd quarter of the year between July and October")