While the rate of fatal road accidents has been decreasing steadily since the 80s, the past ten years have seen a stagnation in this reduction. Coupled with the increase in number of miles driven in the nation, the total number of traffic related-fatalities has now reached a ten year high and is rapidly increasing. By looking at the demographics of traﬃc accident victims for each US state, we find that there is a lot of variation between states. Now we want to understand if there are patterns in this variation in order to derive suggestions for a policy action plan. In particular, instead of implementing a costly nation-wide plan we want to focus on groups of states with similar profiles. How can we find such groups in a statistically sound way and communicate the result effectively?
- 1The raw data files and their format
- 2Read in and get an overview of the data
- 3Create a textual and a graphical summary of the data
- 4Quantify the association of features and accidents
- 5Fit a multivariate linear regression
- 6Perform PCA on standardized data
- 7Visualize the first two principal components
- 8Find clusters of similar states in the data
- 9KMeans to visualize clusters in the PCA scatter plot
- 10Visualize the feature differences between the clusters
- 11Compute the number of accidents within each cluster
- 12Make a decision when there is no clear right choice
PhD Candidate at University of Toronto
Joel is a PhD student in Biomedical Engineering at the University of Toronto, where he uses computational and experimental approaches to better understand fundamental stem cell decisions. Outside school, he enjoys playing ice hockey, eating and making food, being in nature, and figuring out how he can maximize the time he spends inside vim.