Skip to content

How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills you'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

You'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, the dataset is limited to first names which were given to over 5,000 American babies in a given year. The data spans 101 years, from 1920 through 2020.

The Data

baby_names

columntypedescription
yearintyear
first_namevarcharfirst name
sexvarcharsex of babies given first_name
numintnumber of babies of sex given first_name in that year
Spinner
DataFrameas
usa_baby_names
variable
-- Run this code to view the data in baby_names
SELECT *
FROM baby_names
LIMIT 5;
Spinner
DataFrameas
name_types
variable
-- List the overall top five names in alphabetical order and find out if each name is "Classic" or "Trendy."
-- FILTER first_name, sum, popularity_type
-- LIMIT 5
-- ORDER BY first_name ASC
-- GROUP BY first_name

SELECT first_name, SUM(num), 
	CASE WHEN COUNT(first_name) >=50 THEN 'Classic'
	ELSE 'Trendy' END AS popularity_type
FROM baby_names
GROUP BY first_name
ORDER BY first_name ASC
LIMIT 5;

-- So 4 out of the first 5 names alphabetically are trendy. Only Aaron is considered a classic.
Spinner
DataFrameas
top_20
variable
-- What were the top 20 male names overall, and how did the name Paul rank?
-- LIMIT 20
-- WHERE sex is male
-- FILTER name_rank (RANK window function), first_name, sum

SELECT RANK() OVER (ORDER BY SUM(num) DESC) AS name_rank, first_name, SUM(num)
FROM baby_names
WHERE sex = 'M'
GROUP BY first_name
LIMIT 20;

-- Paul ranked 17th in the list. Range of around 3.6 million between 1st and 20th in terms of the sum.
Spinner
DataFrameas
a_names
variable
-- Which female names appeared in both 1920 and 2020?
-- Must compare records from the same table. So self join.
-- FILTER first_name, total_occurrences
-- WHERE females, 1920, 2020

SELECT a.first_name, (a.num + b.num) AS total_occurrences
FROM baby_names AS a
INNER JOIN baby_names AS b
USING (first_name)
WHERE a.sex = 'F' AND a.year = 1920
AND b.sex = 'F' AND b.year = 2020;

-- So only 6 names appear in both 1920 and 2020 for females. Evelyn the most popular closely followed by Elizabeth.