Skip to content
New Workbook
Sign up
Exploring Astronaut Activities in SQL - Solution

Exploring Astronaut Activities in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up your integration, create a PostgreSQL integration with the following credentials:

  • Integration Name: Astronaut Codealong
  • Hostname: workspacedemodb.datacamp.com
  • Database: astronauts
  • Username: astronauts
  • Password: astronauts

Source of the data

Note: You will need to set up your own integration to run the cells in this workspace!

Exploring our data

Let's start by looking at the table we will be working with.

Spinner
DataFrameavailable as
df
variable
SELECT
	*
FROM evas

Let's inspect the purpose column in greater detail.

Spinner
DataFrameavailable as
df
variable
SELECT
	purpose
FROM evas

What are the most common types of EVAs?

Using CASE expressions, let's start to get a rough idea of the most popular types of EVAs.

Spinner
DataFrameavailable as
df
variable
 SELECT
    purpose,
    CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
    CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,        
    CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
    CASE WHEN purpose ILIKE '%recover%' OR purpose ILIKE '%retrieve%' THEN 1 ELSE 0 END AS retrieval,
    CASE WHEN purpose ILIKE '%jettison%' THEN 1 ELSE 0 END AS jettison,
    CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%servic%' THEN 1 ELSE 0 END AS repair
FROM evas

We are now ready to build this into a final query!

Spinner
DataFrameavailable as
df
variable
WITH purposes AS ( 
 	SELECT
        purpose,
        CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
        CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,
        CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
        CASE WHEN purpose ILIKE '%recover%' OR purpose ILIKE '%retrieve%' THEN 1 ELSE 0 END AS retrieval,
        CASE WHEN purpose ILIKE '%jettison%' THEN 1 ELSE 0 END AS jettison,
        CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%servic%' THEN 1 ELSE 0 END AS repair
    FROM evas
)

SELECT 
	SUM(photography) AS count,
    'photography' AS type
FROM purposes
UNION
SELECT 
	SUM(collection) AS count,
    'collection' AS type
FROM purposes
UNION
SELECT 
	SUM(retrieval) AS count,
    'retrieval' AS type
FROM purposes
UNION
SELECT 
	SUM(jettison) AS count,
    'jettison' AS type
FROM purposes
UNION
SELECT 
	SUM(repair) AS count,
    'repair' AS type
FROM purposes
UNION
SELECT 
	SUM(installation) AS count,
    'installation' AS type
FROM purposes
ORDER BY count
Current Type: Bar
Current X-axis: count
Current Y-axis: type
Current Color: None

EVAs by purpose

How much material has been extracted from EVAs?

We also saw numerous references to extracting rock/dust or geological material when skimming through the purpose column. In this case, extracting the total quantity across the columns will be difficult. Regular expressions to the rescue!

We will define a pattern to extract the total pounds collected per EVA, and then sum them up. Let's first do a sense check of the data.

Spinner
DataFrameavailable as
df
variable
SELECT
	purpose
FROM evas
WHERE purpose ILIKE '%geologic%' OR purpose ILIKE '%rock%'

Okay, we now know that the format of the pounds extracted is always "number lbs of rock/geologic". We can construct a pattern to detect this and extract the number!

To do so, we will make use of:

  • \d+ to match one or more digits.
  • \.? to match zero or more periods.\
  • * to match zero or more digits following the optional decimal place.
  • () to specify we only want the digits.
  • \s to match the whitespace (i.e., spaces).
  • | to specify we either want to match "geologic" or "rock".

Let's put this into action, using SUBSTRING() to extract our pattern!

Spinner
DataFrameavailable as
df
variable
SELECT
	purpose,
	SUBSTRING(purpose, '(\d+\.?\d*)\slb\sof\s((rock)|(geologic))') AS weight
FROM evas
WHERE purpose ILIKE '%rock%' OR purpose ILIKE '%geologic%'
Spinner
DataFrameavailable as
df
variable
WITH weights AS (
    SELECT
        purpose,
        SUBSTRING(purpose, '(\d+\.?\d+)\slb\sof\s[rock|geologic]')::NUMERIC AS weight
    FROM evas
    WHERE purpose ILIKE '%rock%' OR purpose ILIKE '%geologic%'
)

SELECT SUM(weight)
FROM weights