Skip to content

                                                                                             Photo credit: SeventyFour

What is a database?

Databases store and organize data electronically. The data is housed in objects called tables, with data organized into rows and columns (or records and fields).

A relational database defines relationships between tables of data inside a database.

Although they can look similar to a spreadsheet, databases can store more data, more securely, and allow multiple users to access the same database.

Our data

This project will use data from the Austin Animal Shelter. In this case, the data is saved to this DataLab workbook as a CSV file, but you'll query it just like you would a database!

Here is copy of the data scription from the Animal Shelter website and what you'd expect to find in the data:

Column NameDescriptionData Type
animal_idIdentifierText
nameAnimal name, if providedText
datetimeDate of outcomeFloating Timestamp
monthyearMonth and year of outcomeFloating Timestamp
date_of_birthAnimal's date of birthFloating Timestamp
outcome_typeDetails of the outcomeText
outcome_subtypeAdditional outcome detailsText
animal_typeType of animalText
sex_upon_outcomeDetails on whether the animal was neutered or intactText
age_upon_outcomeAge of animal at time of outcomeText
breedAnimal breedText
colorAnimal colorText

Step 1: Explore the data

Before you dig into the data, it's always a good idea to explore it to see if it is as expected. Does the data match what was outlined above? Try querying the data to view a snapshot of the first 10 rows.

Spinner
DataFrameas
animal_shelter
variable
SELECT *
FROM 'animal_shelter.csv'
limit 10;

It's also helpful to understand how big the dataset is. How may rows (or records) are there in this data? Query the file to find out.

Spinner
DataFrameas
animal_shelter_2
variable
SELECT COUNT (*) AS total_rows
FROM 'animal_shelter.csv'

Your turn! Can you think of another way to count the rows (or records)?

Spinner
DataFrameas
animal_shelter_3
variable
SELECT COUNT (animal_type)
FROM 'animal_shelter.csv'

Step 2: Get deeper and answer questions about the data

Let's explore the data further. You're probably curious to know what animals came through this shelter so let's start there. How many different animal types has this shelter taken in?

Spinner
DataFrameas
animal_shelter_4
variable
SELECT COUNT (DISTINCT(animal_type)) AS unique_type_count
FROM 'animal_shelter.csv'