Analyzing Crime in LA
🌇🚔 Background
Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World! Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs!
However, as with any highely populated city, it isn't always glamarous and there can be a large volume of crime. That's where you can help!
You have been asked to support the Los Angeles Police Department (LAPD) by analyzing their crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.
You are free to use any methodologies that you like in order to produce your insights.
The Data
They have provided you with a single dataset to use. A summary and preview is provided below.
The data is publicly available here.
👮♀️ crimes.csv
Column | Description |
---|---|
'DR_NO' | Division of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits. |
'Date Rptd' | Date reported - MM/DD/YYYY. |
'DATE OCC' | Date of occurence - MM/DD/YYYY. |
'TIME OCC' | In 24 hour military time. |
'AREA' | The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21. |
'AREA NAME' | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles. |
'Rpt Dist No' | A four-digit code that represents a sub-area within a Geographic Area. All crime records reference the "RD" that it occurred in for statistical comparisons. Find LAPD Reporting Districts on the LA City GeoHub at http://geohub.lacity.org/datasets/c4f83909b81d4786aa8ba8a74ab |
'Crm Cd' | Crime code for the offence committed. |
'Crm Cd Desc' | Definition of the crime. |
'Vict Age' | Victim Age (years) |
'Vict Sex' | Victim's sex: F : Female, M : Male, X : Unknown. |
'Vict Descent' | Victim's descent:
|
'Premis Cd' | Code for the type of structure, vehicle, or location where the crime took place. |
'Premis Desc' | Definition of the 'Premis Cd' . |
'Weapon Used Cd' | The type of weapon used in the crime. |
'Weapon Desc' | Description of the weapon used (if applicable). |
'Status Desc' | Crime status. |
'Crm Cd 1' | Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious. |
'Crm Cd 2' | May contain a code for an additional crime, less serious than Crime Code 1. |
'Crm Cd 3' | May contain a code for an additional crime, less serious than Crime Code 1. |
'Crm Cd 4' | May contain a code for an additional crime, less serious than Crime Code 1. |
'LOCATION' | Street address of the crime. |
'Cross Street' | Cross Street of rounded Address |
'LAT' | Latitude of the crime location. |
'LON' | Longtitude of the crime location. |
import pandas as pd
crimes = pd.read_csv("data/crimes.csv")
crimes.head(20)
Note:
To ensure the best user experience, we currently discourage using Folium and Bokeh in Workspace notebooks.
Project Overview
In this project, we will analyze crime data to gain insights into various aspects of criminal activities in a given region. We will use Python and several data analysis libraries to perform this analysis.
Python Modules Used
- Pandas
- NumPy
- Matplotlib
- Seaborn
Research Questions
We aim to answer the following questions during our analysis:
- Question 1: "What are the most common types of crimes in Los Angeles?" -
- Question 2:"What is the distribution of victim ages in Los Angeles crimes?"
- Question 3:"Which age group is most frequently victimized in crimes, and which is least?"
- Question 4: "Is there a relationship between the victim's age and the severity of the crime committed?" (For Top 10 crimes)
- Question 5: What is the gender distribution of crime victims in Los Angeles?
- Question 6: "Do the top 10 crimes in Los Angeles tend to have more male or female victims?
- Question 7: How does the victim's descent influence their likelihood of becoming a victim, and are there disparities in victimization rates among different descent groups?
- Question 8: Which are the top 5 victim descent groups most commonly victimized in the top 3 crimes?
- Question 9: What are the most common premises where crimes occur in Los Angeles?
- 10 Question 10: Do specific top 3 premises tend to be associated with certain types of top 3 crimes?
- Question 11: During which hours of the day do crimes most frequently occur in Los Angeles?
- Question 12: What are the peak hours for the top 3 types of crimes in Los Angeles? 12.1 Question 12.1: What are the peak time segments for the top 3 types of crimes in Los Angeles, divided into 6 time parts?
- Question 13: What is the spatial distribution of the top 3 crimes in Los Angeles, and are there any noticeable geographic patterns?
import pandas as pd
crimes = pd.read_csv("data/crimes.csv")
#Display the first 5 rows of dataset
crimes.head(5)
#Display number of rows and columns
crimes.shape
#returns the datatypes of the columns
crimes.dtypes
- Maybe I will change the date occurance, date reported and time occurance data type
#displays concise summary with non-null values and data types
crimes.info()
- I drop these columns (vict sex, vic descent, premis cd, premis desc, crm cd 1) null values
- I drop these columns (crm cd 2, 3, 4)
- If I have time I will analyze weapon column specifically
- I think It is also a good idea to look for the Cross Street
#returns descriptive statistics for each column of data
crimes.describe()
#returns number of unique values in each column
crimes.nunique()