Skip to content
To Live and Analyze Crime in LA
  • AI Chat
  • Code
  • Report
  • Analyzing Crime in LA

    ๐ŸŒ‡๐Ÿš”ย Background

    Los Angeles, California ๐Ÿ˜Ž. The City of Angels. Tinseltown. The Entertainment Capital of the World! Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs!

    However, as with any highely populated city, it isn't always glamarous and there can be a large volume of crime. That's where you can help!

    You have been asked to support the Los Angeles Police Department (LAPD) by analyzing their crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

    You are free to use any methodologies that you like in order to produce your insights.

    The Data

    They have provided you with a single dataset to use. A summary and preview is provided below.

    The data is publicly available here.

    ๐Ÿ‘ฎโ€โ™€๏ธ crimes.csv

    'DR_NO'Division of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits.
    'Date Rptd'Date reported - MM/DD/YYYY.
    'DATE OCC'Date of occurence - MM/DD/YYYY.
    'TIME OCC'In 24 hour military time.
    'AREA'The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.
    'AREA NAME'The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles.
    'Rpt Dist No'A four-digit code that represents a sub-area within a Geographic Area. All crime records reference the "RD" that it occurred in for statistical comparisons. Find LAPD Reporting Districts on the LA City GeoHub at
    'Crm Cd'Crime code for the offence committed.
    'Crm Cd Desc'Definition of the crime.
    'Vict Age'Victim Age (years)
    'Vict Sex'Victim's sex: F: Female, M: Male, X: Unknown.
    'Vict Descent'Victim's descent:
    • A - Other Asian
    • B - Black
    • C - Chinese
    • D - Cambodian
    • F - Filipino
    • G - Guamanian
    • H - Hispanic/Latin/Mexican
    • I - American Indian/Alaskan Native
    • J - Japanese
    • K - Korean
    • L - Laotian
    • O - Other
    • P - Pacific Islander
    • S - Samoan
    • U - Hawaiian
    • V - Vietnamese
    • W - White
    • X - Unknown
    • Z - Asian Indian
    'Premis Cd'Code for the type of structure, vehicle, or location where the crime took place.
    'Premis Desc'Definition of the 'Premis Cd'.
    'Weapon Used Cd'The type of weapon used in the crime.
    'Weapon Desc'Description of the weapon used (if applicable).
    'Status Desc'Crime status.
    'Crm Cd 1'Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious.
    'Crm Cd 2'May contain a code for an additional crime, less serious than Crime Code 1.
    'Crm Cd 3'May contain a code for an additional crime, less serious than Crime Code 1.
    'Crm Cd 4'May contain a code for an additional crime, less serious than Crime Code 1.
    'LOCATION'Street address of the crime.
    'Cross Street'Cross Street of rounded Address
    'LAT'Latitude of the crime location.
    'LON'Longtitude of the crime location.

    ๐Ÿ’ช The Challenge

    • Use your skills to produce insights about crimes in Los Angeles.
    • Examples could include examining how crime varies by area, crime type, victim age, time of day, and victim descent.
    • You could build machine learning models to predict criminal activities, such as when a crime may occur, what type of crime, or where, based on features in the dataset.
    • You may also wish to visualize the distribution of crimes on a map.
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    crimes = pd.read_csv("data/crimes.csv")


    # Fixing Vict Gender column
    mapping_sex = {
        'M': 'male',
        'F': 'female',
        'X': 'unknown'
    mapping_descent = {
        'A': 'Other Asian',
        'B': 'Black',
        'C': 'Chinese',
        'D': 'Cambodian',
        'F': 'Filipino',
        'G': 'Guamanian',
        'H': 'Hispanic/Latin/Mexican',
        'I': 'American Indian',
        'J': 'Japanese',
        'K': 'Korean',
        'L': 'Laotian',
        'O': 'Other',
        'P': 'Pacific Islander',
        'S': 'Samoan',
        'U': 'Hawaiian',
        'V': 'Vietnamese',
        'W': 'White',
        'X': 'Unknown',
        'Z': 'Asian Indian'
    df = crimes.copy()
    # Fixing Vict Descent
    df['Vict Descent'] = df['Vict Descent'].apply(lambda x: x if x != '-' else np.nan)
    # Fixing Vict Age column
    df['Vict Age'] = df['Vict Age'].apply(lambda x: x if x > 0 else np.nan)
    df['Vict Age'] = df['Vict Age'].apply(lambda x: x if x < 100 else np.nan)
    bins = np.arange(0, 100, 10)
    df['Vict Agebin'] = pd.cut(df['Vict Age'], bins)
    df['Vict Agebin'] = df['Vict Agebin'].astype(str)
    df['Vict Agebin'] = df['Vict Agebin'].str.replace("]",")")
    # Fixing Vict Sex column
    df['Vict Sex'] = df['Vict Sex'].map(mapping_sex)
    # Remove the time component
    df['DATE OCC'] = df['DATE OCC'].str.split(' ').str[0]
    # Convert to datetime
    df['Date Rptd'] = pd.to_datetime(df['Date Rptd'], format = "%Y-%m-%d")
    df['DATE OCC'] = pd.to_datetime(df['DATE OCC'], format = "%m/%d/%Y")
    df['Month'] = df['Date Rptd'].dt.month
    df['Year'] = df['Date Rptd'].dt.year
    # Extract year, month, date
    df['Date Rptd'] = df['Date Rptd']
    df['DATE OCC'] = df['DATE OCC']
    # Fix time column
    # Add leading zero if length is less than 4
    df['TIME OCC'] = df['TIME OCC'].astype(str).str.zfill(4)
    # Extract hour
    df['Hour'] = pd.to_datetime(df['TIME OCC'], format = '%H%M').dt.hour
    # Remove the zero-coordate locations
    df['LAT'] = df['LAT'].apply(lambda x: x if x > 0 else np.nan)
    df['LON'] = df['LON'].apply(lambda x: x if x < 0 else np.nan)