Photo credit: SeventyFour
What is a database?
Databases store and organize data electronically. The data is housed in objects called tables, with data organized into rows and columns (or records and fields).
A relational database defines relationships between tables of data inside a database.
Although they can look similar to a spreadsheet, databases can store more data, more securely, and allow multiple users to access the same database.
Our data
This project will use data from the Austin Animal Shelter. In this case, the data is saved to this DataLab workbook as a CSV file, but you'll query it just like you would a database!
Here is copy of the data scription from the Animal Shelter website and what you'd expect to find in the data:
| Column Name | Description | Data Type |
|---|---|---|
animal_id | Identifier | Text |
name | Animal name, if provided | Text |
datetime | Date of outcome | Floating Timestamp |
monthyear | Month and year of outcome | Floating Timestamp |
date_of_birth | Animal's date of birth | Floating Timestamp |
outcome_type | Details of the outcome | Text |
outcome_subtype | Additional outcome details | Text |
animal_type | Type of animal | Text |
sex_upon_outcome | Details on whether the animal was neutered or intact | Text |
age_upon_outcome | Age of animal at time of outcome | Text |
breed | Animal breed | Text |
color | Animal color | Text |
Step 1: Explore the data
Before you dig into the data, it's always a good idea to explore it to see if it is as expected. Does the data match what was outlined above? Try querying the data to view a snapshot of the first 10 rows.
SELECT *
FROM 'animal_shelter.csv'
limit 10;It's also helpful to understand how big the dataset is. How may rows (or records) are there in this data? Query the file to find out.
SELECT COUNT (*) AS total_rows
FROM 'animal_shelter.csv'Your turn! Can you think of another way to count the rows (or records)?
SELECT COUNT (animal_type)
FROM 'animal_shelter.csv'Step 2: Get deeper and answer questions about the data
Let's explore the data further. You're probably curious to know what animals came through this shelter so let's start there. How many different animal types has this shelter taken in?
SELECT COUNT (DISTINCT(animal_type)) AS unique_type_count
FROM 'animal_shelter.csv'