Skip to content

How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills you'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

You'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, the dataset is limited to first names which were given to over 5,000 American babies in a given year. The data spans 101 years, from 1920 through 2020.

The Data

baby_names

columntypedescription
yearintyear
first_namevarcharfirst name
sexvarcharsex of babies given first_name
numintnumber of babies of sex given first_name in that year
Spinner
DataFrameas
usa_baby_names
variable
-- Run this code to view the data in baby_names

SELECT *
FROM baby_names;
Spinner
DataFrameas
name_types
variable
-- List the first five names in alphabetical order and find out if each name is "Classic" or "Trendy."
-- Consider the assumption that a name is considered "Classic" if it appears in 50 or more years, and "Trendy" otherwise.

SELECT 
	first_name, 
	SUM(num) AS sum,
	CASE
		WHEN COUNT(first_name) >= 50 THEN 'Classic'
		ELSE 'Trendy'
	END AS popularity_type
FROM baby_names
GROUP BY first_name
ORDER BY first_name ASC 
LIMIT 5; 
Spinner
DataFrameas
top_20
variable
-- What were the top 20 male names overall, and how did the name Paul rank?

SELECT 
	RANK() OVER(ORDER BY SUM(num) DESC) AS name_rank, 
	first_name, 
	SUM(num) AS sum
FROM baby_names
WHERE sex = 'M'
GROUP BY first_name
ORDER BY sum DESC
LIMIT 20;
Spinner
DataFrameas
a_names
variable
-- Which female names appeared in both 1920 and 2020?

SELECT 
	first_name,
	COUNT(DISTINCT year) AS total_occurrences
FROM baby_names
WHERE sex = 'F'
	AND year IN (1920, 2020)
GROUP BY first_name
HAVING COUNT(DISTINCT year) > 1;
Spinner
DataFrameas
Ex_1
variable
-- How many times did each name appear in each year?

SELECT first_name,
       SUM(CASE WHEN year = 1920 THEN num ELSE 0 END) AS count_1920,
       SUM(CASE WHEN year = 2020 THEN num ELSE 0 END) AS count_2020
FROM baby_names
WHERE sex = 'F'
  AND year IN (1920, 2020)
GROUP BY first_name
HAVING COUNT(DISTINCT year) > 1;
Spinner
DataFrameas
Ex_2
variable
-- Use this table for the answer to question 1:
-- List the overall top five names in alphabetical order and find out if each name is "Classic" or "Trendy."

SELECT 
	first_name, 
	MIN(year) AS since_year, 
	MAX(year) AS until_year,
	SUM(num) AS total_name,
	
	-- Classic is MAX Year = 1970 / First 50 Years of analyzed time lapse
	-- Trendy is MAX Year = 2020 / Last 50 Years of analyzed time lapse
	CASE 
		WHEN MAX(year) > 1970 THEN 'Trendy'
		ELSE 'Classic' 
	END AS name_type

FROM baby_names
GROUP BY first_name
ORDER BY total_name DESC, first_name ASC
LIMIT 5;
Spinner
DataFrameas
Ex_3
variable
-- Use this table for the answer to question 2:
-- What were the top 20 male names overall, and how did the name Paul rank?

SELECT first_name, MIN(year) AS since_year, SUM(num) AS total_name, RANK() OVER(ORDER BY SUM(num) DESC)
FROM baby_names
WHERE sex = 'M'
GROUP BY first_name
ORDER BY total_name DESC, first_name ASC
LIMIT 20;
Spinner
DataFrameas
Ex_4
variable
-- Use this table for the answer to question 3:
-- Which female names appeared in both 1920 and 2020?
-- Using Subquery

SELECT first_name, COUNT(*) AS both_years
FROM(
	SELECT first_name, year
	FROM baby_names
	WHERE sex = 'F' 
		AND year IN (1920, 2020)
	ORDER BY first_name ASC
	) AS sub
GROUP BY first_name
HAVING COUNT (*) > 1;
Spinner
DataFrameas
Ex_5
variable
-- Use this table for the answer to question 3:
-- Which female names appeared in both 1920 and 2020?
-- Without using subquery

SELECT 
	first_name,
	COUNT(DISTINCT year) AS count_years
FROM baby_names
WHERE sex = 'F'
	AND year IN (1920, 2020)
GROUP BY first_name
HAVING COUNT(DISTINCT year) > 1;