Skip to content
Project: Exploring Trends in American Baby Names

How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills you'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

You'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, the dataset is limited to first names which were given to over 5,000 American babies in a given year. The data spans 101 years, from 1920 through 2020.

The Data

baby_names

columntypedescription
yearintyear
first_namevarcharfirst name
sexvarcharsex of babies given first_name
numintnumber of babies of sex given first_name in that year
Spinner
DataFrameavailable as
usa_baby_names
variable
-- Run this code to view the data in baby_names
SELECT *
FROM baby_names
WHERE year = 1920;
Spinner
DataFrameavailable as
name_types
variable
-- Use this table for the answer to question 1:
-- List the overall top five names in alphabetical order and find out if each name is "Classic" or "Trendy."
WITH cte AS (
	SELECT 
		*,
		COUNT(YEAR) OVER(PARTITION BY first_name) AS years_appeared
	FROM
		baby_names
)
SELECT 
	first_name,
	SUM(num) AS sum,
	CASE WHEN years_appeared >= 50 THEN 'Classic'
	ELSE 'Trendy' END AS popularity_type
FROM cte 
GROUP BY 
	first_name, years_appeared
ORDER BY 
	first_name ASC
LIMIT 5;
Spinner
DataFrameavailable as
top_20
variable
-- Use this table for the answer to question 2:
-- What were the top 20 male names overall, and how did the name Paul rank?
WITH cte AS (
	SELECT
		first_name,
		sex,
		SUM(num) AS sum
	FROM
		baby_names 
	GROUP BY 
		first_name, sex
)

SELECT 
	RANK() OVER(ORDER BY sum DESC) AS name_rank,
	first_name,
	sum
FROM
	cte 
WHERE sex = 'M'
ORDER BY 
	name_rank ASC
LIMIT 20;
Spinner
DataFrameavailable as
a_names
variable
-- Use this table for the answer to question 3:
-- Which female names appeared in both 1920 and 2020?
SELECT
	first_name,
	COUNT(*) AS total_occurrences 
FROM 
	baby_names 
WHERE 
	sex = 'F' AND
	first_name IN (SELECT first_name FROM baby_names WHERE year = 1920) AND
	first_name IN (SELECT first_name FROM baby_names WHERE year = 2020)
GROUP BY 
	first_name;