Skip to content

Exploring Astronaut Activities in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up your integration, create a PostgreSQL integration with the following credentials:

  • Integration Name: Astronaut Codealong
  • Hostname: workspacedemodb.datacamp.com
  • Database: astronauts
  • Username: astronauts
  • Password: astronauts

Source of the data

Exploring our data

Let's start by looking at the table we will be working with.

Let's inspect the purpose column in greater detail.

Spinner
DataFrameas
df
variable
SELECT DISTINCT vehicle,
COUNT(vehicle)
FROM evas
GROUP BY vehicle
ORDER BY COUNT(vehicle) DESC
LIMIT 10;

What are the most common types of EVAs?

Let's start to get a rough idea of the most popular types of EVAs astronauts take by using CASE expressions.

Spinner
DataFrameas
df
variable
SELECT 
    purpose,
    CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
    CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,
    CASE WHEN purpose ILIKE '%construction%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS install,
    CASE WHEN purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%service%' THEN 1 ELSE 0 END AS repair
FROM
    evas

We are now ready to build this into a final query!

Spinner
DataFrameas
df
variable
WITH purposes AS (    SELECT 
    purpose,
    CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photo,
    CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,
    CASE WHEN purpose ILIKE '%construction%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
    CASE WHEN purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%service%' THEN 1 ELSE 0 END AS repair
FROM
    evas) 
    
    SELECT 
        SUM(photo) AS count,
        'photo' AS type
    FROM 
        purposes
    UNION
    
    SELECT 
        SUM(collection) AS count,
        'collection' AS type
    FROM
        purposes
    UNION
    
    SELECT 
        SUM(installation) AS count,
        'installation' AS type
    FROM 
        purposes
    UNION
    
    SELECT 
        SUM(repair) AS count,
        'repair' AS type
    FROM 
        purposes
    ORDER BY count DESC

How much material has been extracted from EVAs?

Skimming through the purpose column, we also saw numerous references to extracting rock/dust or geological material. In this case, it will be difficult to extract the total quantity across the columns. Regular expressions to the rescue!

We will define a pattern to extract the total pounds extracted per EVA, and then sum them up. Let's first do a sense check of the data.

Spinner
DataFrameas
df
variable
SELECT 
    purpose
FROM 
    evas
WHERE
    purpose ILIKE '%geologic%' OR 
    purpose ILIKE '%rock%'

Okay, we now know that the format of the pounds extracted is always number lbs of rock/geologic. We can construct a pattern to detect this and extract the number!

To do so, we will make use of:

  • \d+ to match one or more digits.
  • \.? to match zero or more periods.\
  • * to match zero or more digits following the optional decimal place.
  • () to specify we only want the digits.
  • \s to match the whitespace (i.e., spaces).
  • [] and | to specify we either want to match "geologic" or "rock".

Let's put this into action, using SUBSTRING() to extract our pattern!

Spinner
DataFrameas
df
variable
SELECT 
    purpose,
    SUBSTRING(purpose, '(\d+\.?\d*) lb of ((rock)|(geologic))') AS weight
FROM
    evas

Now we can use a CTE to calculate the total amount!

Spinner
DataFrameas
df
variable
WITH weights AS (SELECT 
    purpose,
    SUBSTRING(purpose, '(\d+\.?\d*) lb of ((rock)|(geologic))') AS weight
FROM
    evas)
    
SELECT 
    SUM(weight::numeric)
FROM
    weights