Animal_Shelter_data

What is a database?

Databases store and organize data electronically. The data is housed in objects called tables, with data organized into rows and columns (or records and fields).

A relational database defines relationships between tables of data inside a database.

Although they can look similar to a spreadsheet, databases can store more data, more securely, and allow multiple users to access the same database.

Our data

This project will use data from the Austin Animal Shelter. In this case, the data is saved to this DataLab workbook as a CSV file, but you'll query it just like you would a database!

Here is copy of the data scription from the Animal Shelter website and what you'd expect to find in the data:

Column Name	Description	Data Type
`animal_id`	Identifier	Text
`name`	Animal name, if provided	Text
`datetime`	Date of outcome	Floating Timestamp
`monthyear`	Month and year of outcome	Floating Timestamp
`date_of_birth`	Animal's date of birth	Floating Timestamp
`outcome_type`	Details of the outcome	Text
`outcome_subtype`	Additional outcome details	Text
`animal_type`	Type of animal	Text
`sex_upon_outcome`	Details on whether the animal was neutered or intact	Text
`age_upon_outcome`	Age of animal at time of outcome	Text
`breed`	Animal breed	Text
`color`	Animal color	Text

DataFrameas

df13

variable

SELECT 
    animal_type,
    EXTRACT(YEAR FROM datetime) AS adoption_year,
    EXTRACT(MONTH FROM datetime) AS adoption_month,
    COUNT(*) AS total_adoptions
FROM 'animal_shelter.csv'
WHERE outcome_type = 'Adoption'
GROUP BY animal_type, adoption_year, adoption_month
ORDER BY adoption_year, adoption_month, animal_type;

DataFrameas

df15

variable

WITH AdoptionStats AS (
    SELECT 
        animal_type,
        EXTRACT(YEAR FROM datetime) AS adoption_year,
        EXTRACT(MONTH FROM datetime) AS adoption_month,
        COUNT(*) AS total_adoptions
    FROM 'animal_shelter.csv'
    WHERE outcome_type = 'Adoption'
    GROUP BY animal_type, adoption_year, adoption_month
),
MonthlyStats AS (
    SELECT 
        adoption_year,
        adoption_month,
        SUM(total_adoptions) AS monthly_total_adoptions
    FROM AdoptionStats
    GROUP BY adoption_year, adoption_month
    ORDER BY monthly_total_adoptions DESC
    LIMIT 5
)
SELECT 
    m.adoption_year,
    m.adoption_month,
    m.monthly_total_adoptions,
    a.animal_type,
    a.total_adoptions AS highest_animal_adoptions
FROM MonthlyStats AS m
JOIN AdoptionStats AS a
ON m.adoption_year = a.adoption_year
   AND m.adoption_month = a.adoption_month
ORDER BY a.total_adoptions DESC
LIMIT 5;

Step 1: Explore the data

DataFrameas

df14

variable

SELECT * 
FROM 'animal_shelter.csv'
LIMIT 5;

Before you dig into the data, it's always a good idea to explore it to see if it is as expected. Does the data match what was outlined above? Try querying the data to view a snapshot of the first 10 rows.

DataFrameas

animal_shelter

variable

SELECT COUNT(outcome_type) AS total_outcome_type, datetime, outcome_type
FROM 'animal_shelter.csv'
GROUP BY datetime, outcome_type

It's also helpful to understand how big the dataset is. How may rows (or records) are there in this data? Query the file to find out.

DataFrameas

animal_shelter_2

variable

SELECT COUNT(*) AS total_rows
FROM 'animal_shelter.csv';

Your turn! Can you think of another way to count the rows (or records)?

DataFrameas

animal_shelter_3

variable

WITH AdoptionStats AS (
    SELECT 
        animal_type,
        sex_upon_outcome,
        COUNT(*) AS total_outcomes,
        SUM(CASE WHEN outcome_type = 'Adoption' THEN 1 ELSE 0 END) AS adoption_count
    FROM 'animal_shelter.csv'
    GROUP BY animal_type, sex_upon_outcome
)
SELECT 
    animal_type,
    sex_upon_outcome,
    total_outcomes,
    adoption_count,
    (adoption_count * 1.0 / total_outcomes) AS adoption_probability
FROM AdoptionStats
ORDER BY adoption_probability DESC;

‌
‌
‌

Animal_Shelter_data

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}What is a database?

Our data

Step 1: Explore the data

What is a database?