Premium Project

Comparing Cosmetics by Ingredients

Process ingredient lists for cosmetics on Sephora then visualize similarity using t-SNE and Bokeh.

Start Project
  • 11 tasks
  • 287 participants
  • 1,500 XP

Project Description

Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's tought to interpret those ingredient lists unless you have a background in chemistry.

Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. In this Project, you are going to create a content-based recommendation system where the 'content' will be the chemical components of cosmetics. Specifically, you will process ingredient lists for 1472 cosmetics on Sephora via word embedding, then visualize ingredient similarity using a machine learning method called t-SNE and an interactive visualization library called Bokeh.

This Project lets you apply the skills from Manipulating DataFrames with pandas, Chapter 1 of Dimensionality Reduction in Python, and Interactive Data Visualization with Bokeh. This Project also includes the concepts of natural language processing and word embedding, which you can learn about in Natural Language Processing Fundamentals in Python.

Project Tasks

  • 1Cosmetics, chemicals ... it's complicated
  • 2Focus on one product category and one skin type
  • 3Tokenizing the ingredients
  • 4Initializing a document-term matrix (DTM)
  • 5Creating a counter function
  • 6The Cosmetic-Ingredient matrix!
  • 7Dimension reduction with t-SNE
  • 8Let's map the items with Bokeh
  • 9Adding a hover tool
  • 10Mapping the cosmetic items
  • 11Comparing two products
Instructor Avatar
Jiwon Jeong

Graduate Research Assistant at Yonsei University

Jiwon is a graduate student majoring in Industrial Engineering at Yonsei University. Her core research area is in developing business strategies with a statistical approach. She has a passion for finding novel applications of machine learning. Outside of school, she is a lover of travel and books and is a healthy living advocate.

See More

Technology

  • Python LogoPython
  • Topics

    Data ManipulationData VisualizationMachine LearningImporting & Cleaning Data