Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World!
Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!
You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.
The Data
They have provided you with a single dataset to use. A summary and preview are provided below.
It is a modified version of the original data, which is publicly available from Los Angeles Open Data.
crimes.csv
Column | Description |
---|---|
'DR_NO' | Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits. |
'Date Rptd' | Date reported - MM/DD/YYYY. |
'DATE OCC' | Date of occurrence - MM/DD/YYYY. |
'TIME OCC' | In 24-hour military time. |
'AREA NAME' | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles. |
'Crm Cd Desc' | Indicates the crime committed. |
'Vict Age' | Victim's age in years. |
'Vict Sex' | Victim's sex: F : Female, M : Male, X : Unknown. |
'Vict Descent' | Victim's descent:
|
'Weapon Desc' | Description of the weapon used (if applicable). |
'Status Desc' | Crime status. |
'LOCATION' | Street address of the crime. |
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the crimes dataset from a CSV file
# Parse the reporting and occurrence dates as datetime objects
# Ensure the time of occurrence is read as a string to preserve leading zeros
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
# Display the first five rows
crimes.head()
Which hour has the highest frequency of crimes?
# Calculate the frequency of crimes for each hour by counting occurrences in 'TIME OCC'
crimes['Hour OCC'] = crimes['TIME OCC'].str[:2].astype(int)
crime_frequency = crimes['Hour OCC'].value_counts()
# Display the hour with the highest number of crimes and the count of crimes in that hour
print(f"Peak crime hour is: {crime_frequency.index[0]}, with {crime_frequency[0]} crimes reported")
sns.countplot(x='Hour OCC', data=crimes, color='r')
# Add a title to the plot
plt.title('Distribution of Crimes by Time of Occurrence')
# Add labels to the axes
plt.xlabel('Time of Occurrence')
plt.ylabel('Number of Crimes')
# Show the plot
plt.show()
peak_crime_hour = 12
Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)?
# Define the hours considered as night hours for the analysis
night_hours = ['22', '23', '00', '01', '02', '03']
# Filter the crimes dataframe to only include crimes occurring during the defined night hours
night_hours_crimes = crimes[crimes['Hour OCC'].astype(str).isin(night_hours)]
# Count the frequency of crimes by area during the night hours
peak_night_crime_location = night_hours_crimes.groupby('AREA NAME').size().sort_values(ascending=False)
# Display the area with the highest frequency of night crimes and the count of crimes
print(f"{peak_night_crime_location.index[0]} has the highest frequency of night crimes, with {peak_night_crime_location[0]} crimes reported")
peak_night_crime_location = 'Central'
The number of crimes committed against victims of different age groups.
# Define the age bins for categorizing victim ages
age_bins = [0, 17, 25, 34, 44, 54, 64, np.inf]
# Define the labels for each age bin
age_labels = ["0-17", "18-25", "26-34", "35-44", "45-54", "55-64", "65+"]
# Categorize victims' ages into defined bins and create a new column for these categories
crimes['Victim_ages'] = pd.cut(crimes['Vict Age'], bins=age_bins, labels=age_labels)
# Count the number of crimes for each age category
victim_ages = crimes['Victim_ages'].value_counts()
# Display the count of crimes against victims in each age group
victim_ages
# Create a count plot for crimes by victim age group
sns.countplot(x='Victim_ages', data=crimes, palette='viridis')
# Set the title of the plot with font size and weight
plt.title('Crimes by Victim Age Group', fontsize=14, fontweight='bold')
# Set the x-axis label with font weight
plt.xlabel('Age Groups', fontweight='bold')
# Set the y-axis label with font weight
plt.ylabel('Number of Crimes', fontweight='bold')
# Display the plot
plt.show()