Premium project

Comparing Cosmetics by Ingredients

Process ingredient lists for cosmetics on Sephora then visualize similarity using t-SNE and Bokeh.

Start Project
11 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's tought to interpret those ingredient lists unless you have a background in chemistry. Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. In this Project, you are going to create a content-based recommendation system where the 'content' will be the chemical components of cosmetics. Specifically, you will process ingredient lists for 1472 cosmetics on Sephora via [word embedding](https://en.wikipedia.org/wiki/Word_embedding), then visualize ingredient similarity using a machine learning method called t-SNE and an interactive visualization library called Bokeh.

Project Tasks

  1. 1
    Cosmetics, chemicals... it's complicated
  2. 2
    Focus on one product category and one skin type
  3. 3
    Tokenizing the ingredients
  4. 4
    Initializing a document-term matrix (DTM)
  5. 5
    Creating a counter function
  6. 6
    The Cosmetic-Ingredient matrix!
  7. 7
    Dimension reduction with t-SNE
  8. 8
    Let's map the items with Bokeh
  9. 9
    Adding a hover tool
  10. 10
    Mapping the cosmetic items
  11. 11
    Comparing two products

Technologies

Python Python

Topics

Data ManipulationData VisualizationMachine LearningImporting & Cleaning Data
Jiwon Jeong Headshot

Jiwon Jeong

Graduate Research Assistant at Yonsei University

Jiwon is a graduate student majoring in Industrial Engineering at Yonsei University. Her core research area is in developing business strategies with a statistical approach. She has a passion for finding novel applications of machine learning. Outside of school, she is a lover of travel and books and is a healthy living advocate.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA