Skip to content

Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World!

Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!

You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

The Data

They have provided you with a single dataset to use. A summary and preview are provided below.

It is a modified version of the original data, which is publicly available from Los Angeles Open Data.

crimes.csv

ColumnDescription
'DR_NO'Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits.
'Date Rptd'Date reported - MM/DD/YYYY.
'DATE OCC'Date of occurrence - MM/DD/YYYY.
'TIME OCC'In 24-hour military time.
'AREA NAME'The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles.
'Crm Cd Desc'Indicates the crime committed.
'Vict Age'Victim's age in years.
'Vict Sex'Victim's sex: F: Female, M: Male, X: Unknown.
'Vict Descent'Victim's descent:
  • A - Other Asian
  • B - Black
  • C - Chinese
  • D - Cambodian
  • F - Filipino
  • G - Guamanian
  • H - Hispanic/Latin/Mexican
  • I - American Indian/Alaskan Native
  • J - Japanese
  • K - Korean
  • L - Laotian
  • O - Other
  • P - Pacific Islander
  • S - Samoan
  • U - Hawaiian
  • V - Vietnamese
  • W - White
  • X - Unknown
  • Z - Asian Indian
'Weapon Desc'Description of the weapon used (if applicable).
'Status Desc'Crime status.
'LOCATION'Street address of the crime.
# Re-run this cell
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
crimes.head()
# Begin the analysis with EDA

import pandas as pd

def prelim_eda(df = None):
    """
    Performs a preliminary exploratory data analysis.  No distribution plots or bar plots are included.

    Parameters:
    df (DataFrame): dataframe to be explored

    Returns:
    None
    
    """
    
    if df is None:
        return "Please pass a valid DataFrame to prelim_eda"
    
    display(df.head(10))
    print("")
    print(df.columns)
    print("")
    print(df.info())
    print("")
    print(df.describe())
    print("")
    for col in df.columns:
        column_type = df[col].dtype
        if (column_type != 'datetime64[ns]') & (col != "DR_NO"):
            print(f"column: {col}\n", df[col].unique(),  sep="",)
            print("number of unique = ", df[col].nunique())
            print("")

    return None


# Assuming 'crimes' is a DataFrame defined elsewhere in the notebook
prelim_eda(crimes)
print("")

Perform some cleanup work:

crimes.columns = ['DR_NO', 'Date_Rptd', 'Date_Occurred', 'Time_Occurred', 'Patrol_Area',
       'Crime_Desc', 'Victim_Age', 'Victim_Sex', 'Victim_Race', 'Weapon_Desc',
       'Status_Desc', 'Crime_Location']

crimes.info()
print("")

There are several observations that can be derived:

  1. Time_Occurred is not a time type...probably should be converted to a datetime object using the Time_Occurred column
  2. There are null values in Victim_Sex, Victim_Race, and Weapon_Desc
# Replace null values in the 'Victim_Sex', 'Victim_Race', and 'Weapon_Desc' columns with 'unknown'
crimes['Victim_Sex'].fillna('unknown', inplace=True)
crimes['Weapon_Desc'].fillna('unknown', inplace=True)


# Map the values in the 'Victim_Sex' column to the desired categories
sex_mapping = {
    "F": "Female",
    "M": "Male",
    "X": "Unspecified",
    "H": "Homosexual",
    "unknown": "Unknown",
}
crimes['Victim_Sex'] = crimes['Victim_Sex'].map(sex_mapping)

# Display the unique values of Victim_Sex to confirm the mapping
print(crimes['Victim_Sex'].unique())
print("")


# convert the Victim_Sex column to a category parameter
crimes["Victim_Sex"] = crimes["Victim_Sex"].astype("category")

# Display the unique values of Victim_Sex to confirm the mapping
print(crimes['Victim_Sex'].unique())
print("")

# Display crimes info
crimes.info()

Now, map the Victim_Race into more explicit categories: Victim's descent: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian

# set up the race mapping
race_mapping = {
"A": "Other_Asian",
"B": "Black",
"C": "Chinese",
"D": "Cambodian",
"F": "Filipino",
"G": "Guamanian",
"H": "Hispanic_Latin_Mexican",
"I": "American_Indian_Alaskan_Native",
"J": "Japanese",
"K": "Korean",
"L": "Laotian",
"O": "Other",
"P": "Pacific_Islander",
"S": "Samoan",
"U": "Hawaiian",
"V": "Vietnamese",
"W": "White",
"X": "Unknown",
"Z": "Asian_Indian",
}

crimes['Victim_Race'] = crimes['Victim_Race'].map(race_mapping)
crimes['Victim_Race'].fillna('Unknown', inplace=True)

# convert the Victim_Race column to a category parameter
crimes["Victim_Race"] = crimes["Victim_Race"].astype("category")

print("********************************************")
# display the unique categories for Victim_Race
print(crimes["Victim_Race"].unique())
print("")

# Display crimes info
crimes.info()


What about duplicated lines of data?

# explored duplicate lines

# Find duplicated lines of data in the crimes dataframe
duplicated_lines = crimes[crimes.duplicated()]

# Display duplicated lines
duplicated_lines

# Thus, there are no duplicated lines of data

# Find the duplicated DR_NO
duplicated_record_numbers = crimes[crimes["DR_NO"].duplicated()]

# display duplicated record numbers
duplicated_record_numbers
# Convert the Time_Occurred column to a datetime format
# Assuming Time_Occurred is in HHMM format (e.g., 1330 for 1:30 PM)

# First, ensure Time_Occurred is a string
crimes['Time_Occurred'] = crimes['Time_Occurred'].astype(str)

# Pad Time_Occurred with leading zeros if necessary
crimes['Time_Occurred'] = crimes['Time_Occurred'].str.zfill(4)

# Create a new column with the combined Date_Occurred and Time_Occurred
crimes['DateTime_Occurred'] = pd.to_datetime(crimes['Date_Occurred'].dt.strftime('%Y-%m-%d') + ' ' + crimes['Time_Occurred'].str[:2] + ':' + crimes['Time_Occurred'].str[2:], format='%Y-%m-%d %H:%M')

# display the crimes dataframe
crimes.head()
print("")
print(crimes.columns)

# Reorder the columns of the crimes DataFrame
new_column_order = [
    "DR_NO", "Date_Rptd", "Date_Occurred", "Time_Occurred", 
    "DateTime_Occurred", "Patrol_Area", "Crime_Desc", 
    "Victim_Age", "Victim_Sex", "Victim_Race", 
    "Weapon_Desc", "Status_Desc", "Crime_Location"
]
crimes = crimes[new_column_order]
print("")
print(crimes.columns)



print("")
display(crimes.head())
print("")
crimes.info()

At this stage, the data has been partially explored and cleaned to an preliminary extent. We are now ready to consider several questions going forward:

Here are the project directions taken from the associate data scientist career tracks page:

  1. Find out when and where crime is most likely to occur in LA.
  2. Find the related types of crime commonly committed in LA.

Here are my questions:

First look at the total number of crimes What percentage of crimes are violent crimes? Look at the distribution of crimes over the timeframe of the data (counts) Look at how the distribution of crime against the categories of sex, race and age varies with time Look at the distribution of the differnce between date reported and date occurred. Also plot this difference as a function of time. Has this differce increased or decreased over time?

Look at scatter plots of Victim_Age vs Victim_Sex Look at scatter plots of Victim_age vs Victim_Race Explore the heatmap of age, sex, race

Can you find a relationship between crime description and victim race, age, sex?

  1. Against what groups are the largest percentage of crimes committed? Include the categories of age, sex and race.

  2. On what date of the year are most crimes committed, on the average over the data time frame?

  3. Least crimes committed?

  4. What are the average number of crimes committed during each season of the year? Creat a new column with a season category.

  5. What are the average number of crimes committed during the Christmas Season: Nov23-Jan4

  6. At what time of day are most crimes committed? Least crimes committed? To answer this question, create new categories of time of day: "Late_Night" - 12am until 3am "Early_Morning" - 3am until 6am "Morning" - 6am until 9am "Late_Morning" - 9am until Noon "Early_Afternoon" - Noon unitl 3pm "Late_Afternoon" - 3pm unitl 6pm "Early_Night" - 6pm unitl 9pm "Night" - 9pm unitl 12am

  7. Are there exact locations where crimes are repeated?

  8. What patrol areas have the top 3 average crime rates?

  9. What weapons are most commonly used to commit crimes?

  10. What percentage of crimes are committed with no weapons?
    With weapons?
    Unknown?

#

Continue Exploring