Skip to content

1. Classic American names

Lots of name tags piled on top of each other.

Photo by Travis Wise on Wikimedia.

How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills we'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

We'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, we've limited the dataset to first names which were given to over 5,000 American babies in a given year. Our data spans 101 years, from 1920 through 2020.

baby_names

columntypemeaning
yearintyear
first_namevarcharfirst name
sexvarcharsex of babies given first_name
numintnumber of babies of sex given first_name in that year

Let's get oriented to American baby name tastes by looking at the names that have stood the test of time!

Spinner
DataFrameas
baby_names
variable
-- BABY NAMES
SELECT *
FROM public.baby_names
LIMIT 5;
Spinner
DataFrameas
df1
variable
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'baby_names';
Spinner
DataFrameas
df
variable
-- Find names that have been given to over 5,000 babies of either sex every year for the 101 years from 1920 through 2020; recall that names only show up in our dataset when at least 5,000 babies have been given that name in a year.

-- Select first names and the total babies with that first_name
-- Group by first_name and filter for those names that appear in all 101 years
-- Order by the total number of babies with that first_name, descending

SELECT first_name, SUM(num)
FROM public.baby_names
WHERE num > 5000
GROUP BY public.baby_names.first_name
HAVING COUNT(year) = 101
ORDER BY SUM(num) DESC;

2. Timeless or trendy?

Wow, it looks like there are a lot of timeless traditionally male names! Elizabeth is holding her own for the female names, too.

Now, let's broaden our understanding of the dataset by looking at all names. We'll attempt to capture the type of popularity that each name in the dataset enjoyed. Was the name classic and popular across many years or trendy, only popular for a few years? Let's find out.

Spinner
DataFrameas
df2
variable
-- Classify each name's popularity according to the number of years that the name appears in the dataset.

SELECT first_name,
       SUM(num) AS total_babies,
	   CASE 
	      WHEN COUNT(year) > 80 THEN 'Classic'
		  WHEN COUNT(year) > 50 THEN 'Semi-classic'
		  WHEN COUNT(year) > 20 THEN 'Semi-trendy'
		  WHEN COUNT(year) > 0 THEN 'Trendy'
		  ELSE 'Invalid'
		END AS popularity_type
	   
FROM baby_names
GROUP BY first_name
ORDER BY first_name
LIMIT 10;

3. Top-ranked female names since 1920

Did you find your favorite American celebrity's name on the popularity chart? Was it classic or trendy? How do you think the name Henry did? What about Jaxon?

Since we didn't get many traditionally female names in our classic American names search in the first task, let's limit our search to names which were given to female babies.

We can use this opportunity to practice window functions by assigning a rank to female names based on the number of babies that have ever been given that name. What are the top-ranked female names since 1920?

Spinner
DataFrameas
df3
variable
-- Let's take a look at the ten highest-ranked American female names in our dataset.
SELECT RANK() OVER (ORDER BY SUM(num) DESC) AS name_rank,
       first_name,
       SUM(num) 

FROM baby_names
WHERE sex = 'F'
GROUP BY first_name
ORDER BY name_rank
LIMIT 10;

4. Picking a baby name

Perhaps a friend has heard of our work analyzing baby names and would like help choosing a name for her baby, a girl. She doesn't like any of the top-ranked names we found in the previous task.

She's set on a traditionally female name ending in the letter 'a' since she's heard that vowels in baby names are trendy. She's also looking for a name that has been popular in the years since 2015.

Let's see what we can do to find some options for this friend!

Spinner
DataFrameas
df4
variable
-- She's set on a traditionally female name ending in the letter 'a' since she's heard that vowels in baby names are trendy. She's also looking for a name that has been popular in the years since 2015.

SELECT first_name,
       SUM (num)
FROM baby_names
WHERE first_name LIKE '%a'
      AND year > 2015
	  AND sex = 'F'
GROUP BY first_name
ORDER BY SUM(num) DESC
LIMIT 5;

5. The Olivia expansion

Based on the results in the previous task, we can see that Olivia is the most popular female name ending in 'A' since 2015. When did the name Olivia become so popular?

Let's explore the rise of the name Olivia with the help of a window function.

Spinner
DataFrameas
df5
variable
SELECT year, first_name, num,
    SUM(num) OVER (ORDER BY year) AS cumulative_olivias
FROM baby_names
WHERE first_name = 'Olivia'
ORDER BY year
LIMIT 5;

6. Many males with the same name

Wow, Olivia has had a meteoric rise! Let's take a look at traditionally male names now. We saw in the first task that there are nine traditionally male names given to at least 5,000 babies every single year in our 101-year dataset! Those names are classics, but showing up in the dataset every year doesn't necessarily mean that the timeless names were the most popular. Let's explore popular male names a little further.

In the next two tasks, we will build up to listing every year along with the most popular male name in that year. This presents a common problem: how do we find the greatest X in a group? Or, in the context of this problem, how do we find the male name given to the highest number of babies in a year?

In SQL, one approach is to use a subquery. We can first write a query that selects the year and the maximum num of babies given any single male name in that year. For example, in 1989, the male name given to the highest number of babies was given to 65,339 babies. We'll write this query in this task. In the next task, we can use the code from this task as a subquery to look up the first_name that was given to 65,339 babies in 1989… as well as the top male first name for all other years!

Spinner
DataFrameas
df6
variable
-- Write a query that selects the year and the maximum num of babies given any male name in that year.
SELECT year, 
       first_name,
       num AS max_num
FROM public.baby_names
WHERE sex = 'M' AND (year, num) IN (
    SELECT year, max(num)
    FROM public.baby_names
    WHERE sex = 'M'
    GROUP BY year
)
LIMIT 5;

7. Top male names over the years

In the previous task, we found the maximum number of babies given any one male name in each year. Incredibly, the most popular name each year varied from being given to less than 20,000 babies to being given to more than 90,000!

In this task, we find out what that top male name is for each year in our dataset.