Skip to content

Exploring Astronaut Activities in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up your integration, create a PostgreSQL integration with the following credentials:

  • Integration Name: Astronaut Codealong
  • Hostname: workspacedemodb.datacamp.com
  • Database: astronauts
  • Username: astronauts
  • Password: astronauts

Source of the data

Exploring our data

Let's start by looking at the table we will be working with.

Spinner
DataFrameas
df
variable
SELECT *
FROM dfastronauts
LIMIT 10;

Let's inspect the purpose column in greater detail.

Spinner
DataFrameas
df
variable
Run cancelled

What are the most common types of EVAs?

Let's start to get a rough idea of the most popular types of EVAs astronauts take by using CASE expressions.

Spinner
DataFrameas
df
variable
Run cancelled

We are now ready to build this into a final query!

Spinner
DataFrameas
df
variable

How much material has been extracted from EVAs?

Skimming through the purpose column, we also saw numerous references to extracting rock/dust or geological material. In this case, it will be difficult to extract the total quantity across the columns. Regular expressions to the rescue!

We will define a pattern to extract the total pounds extracted per EVA, and then sum them up. Let's first do a sense check of the data.

Spinner
DataFrameas
df
variable

Okay, we now know that the format of the pounds extracted is always number lbs of rock/geologic. We can construct a pattern to detect this and extract the number!

To do so, we will make use of:

  • \d+ to match one or more digits.
  • \.? to match zero or more periods.\
  • * to match zero or more digits following the optional decimal place.
  • () to specify we only want the digits.
  • \s to match the whitespace (i.e., spaces).
  • [] and | to specify we either want to match "geologic" or "rock".

Let's put this into action, using SUBSTRING() to extract our pattern!

Spinner
DataFrameas
df
variable

Now we can use a CTE to calculate the total amount!

Spinner
DataFrameas
df
variable
Run cancelled