Skip to content

How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills you'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

You'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, the dataset is limited to first names which were given to over 5,000 American babies in a given year. The data spans 101 years, from 1920 through 2020.

The Data

baby_names

columntypedescription
yearintyear
first_namevarcharfirst name
sexvarcharsex of babies given first_name
numintnumber of babies of sex given first_name in that year
Spinner
DataFrameas
usa_baby_names
variable
-- Run this code to view the data in baby_names
SELECT *
FROM baby_names
LIMIT 5;

Before you proceed, consider the assumption that a name is considered "Classic" if it appears in 50 or more years, and "Trendy" otherwise.

Spinner
DataFrameas
name_types
variable
-- Use this table for the answer to question 1:
-- List the overall top five names in alphabetical order and find out if each name is "Classic" or "Trendy."
SELECT first_name, COUNT(year) AS sum,
	CASE
		WHEN COUNT(year) > 50 THEN 'Classic'
		ELSE 'Trendy'
	END AS popularity_type
FROM baby_names
GROUP BY first_name
ORDER BY first_name
LIMIT 5;
Hidden output
indexfirst_namesumpopularity_type
0Aaliyah3Trendy
1Aaron51Classic
2Abigail28Trendy
3Adam46Trendy
4Addison13Trendy
Spinner
DataFrameas
top_20
variable
-- Use this table for the answer to question 2:
-- What were the top 20 male names overall, and how did the name Paul rank?
SELECT RANK() OVER(
	ORDER BY SUM(num) DESC
) AS name_rank, first_name, SUM(num) AS sum
FROM baby_names
WHERE sex = 'M'
GROUP BY first_name
ORDER BY sum DESC
LIMIT 20;
Hidden output
indexname_rankfirst_namesum
01James4748138
12John4510721
23Robert4495199
34Michael4278824
45William3614424
56David3571498
67Richard2414838
78Joseph2361382
89Thomas2166802
910Charles2112352
1011Christopher2012792
1112Daniel1824274
1213Matthew1567204
1314Anthony1344352
1415Donald1280236
1516Mark1265910
1617Paul1218996
1718Steven1216819
1819Joshua1204236
1920Andrew1157548
Spinner
DataFrameas
a_names
variable
-- Use this table for the answer to question 3:
-- Which female names appeared in both 1920 and 2020?
SELECT a.first_name, (a.num + b.num) AS total_occurrences
FROM baby_names AS a
JOIN baby_names AS b
USING (first_name)
WHERE a.year = 1920 AND a.sex = 'F' 
	AND b.year = 2020 AND b.sex = 'F'
Hidden output
indexfirst_nametotal_occurrences
0Emma20818
1Evelyn23283
2Elizabeth23125
3Eleanor14832
4Grace12741
5Hazel12765