Premium project

Name Game: Gender Prediction using Sound

Analyze the gender distribution of children's book writers and use sound to match names to gender.

Start Project
8 Tasks1,500 XP

Loved by learners at thousands of companies

Project Description

The same name can be spelled out in a many ways (for example, Marc and Mark, or Elizabeth and Elisabeth). Sound can, therefore, be a better way to match names than spelling. In this project, you will use the Python package [Fuzzy]( to find out the genders of authors that have appeared in the New York Times Best Seller list for Children's Picture books. First, using fuzzy (sound) name matching, you will search for author names in a dataset provided by the US Social Security Administration that contains names and genders of all individuals who have applied for Social Security Cards. Next, we'll aggregate the author dataset by including gender. Finally, you will use the new dataset to plot the gender distribution of children's picture books authors over time. To complete this project, you should be familiar with `pandas` DataFrames, NumPy for basic statistics, and Matplotlib for plotting.

Project Tasks

  1. 1
    Sound it out!
  2. 2
    Authoring the authors
  3. 3
    It's time to bring on the phonics... _again_!
  4. 4
    The inbetweeners
  5. 5
    Playing matchmaker
  6. 6
    Tally up
  7. 7
    Foreign-born authors?
  8. 8
    Raising the bar


Python Python


Case Studies
Tufool Alnuaimi Headshot

Tufool Alnuaimi

Academic entrepreneur with a focus on data science

Tufool’s data science journey begun at MIT, where she obtained her earliest degrees in EECS. Later, after receiving her PhD from Imperial College, she joined the College as an Assistant Professor of Data Science and Innovation. Today, Tufool is working on an exciting project (“Chartyn”) that uses machine learning to present insightful data visuals. Find out more about Tufool and the Chartyn project.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA