Exploring Astronaut Activities in SQL
Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.
To set up your integration, create a PostgreSQL integration with the following credentials:
- Integration Name: Astronaut Codealong
- Hostname: workspacedemodb.datacamp.com
- Database: astronauts
- Username: astronauts
- Password: astronauts
Exploring our data
Let's start by looking at the table we will be working with.
SELECT *
FROM dfastronauts
LIMIT 10;
Let's inspect the purpose
column in greater detail.
What are the most common types of EVAs?
Let's start to get a rough idea of the most popular types of EVAs astronauts take by using CASE
expressions.
We are now ready to build this into a final query!
How much material has been extracted from EVAs?
Skimming through the purpose
column, we also saw numerous references to extracting rock/dust or geological material. In this case, it will be difficult to extract the total quantity across the columns. Regular expressions to the rescue!
We will define a pattern to extract the total pounds extracted per EVA, and then sum them up. Let's first do a sense check of the data.
Okay, we now know that the format of the pounds extracted is always number lbs of rock/geologic
. We can construct a pattern to detect this and extract the number!
To do so, we will make use of:
\d+
to match one or more digits.\.?
to match zero or more periods.\*
to match zero or more digits following the optional decimal place.()
to specify we only want the digits.\s
to match the whitespace (i.e., spaces).[]
and|
to specify we either want to match "geologic" or "rock".
Let's put this into action, using SUBSTRING()
to extract our pattern!
Now we can use a CTE to calculate the total amount!