Process ingredient lists for cosmetics on Sephora then visualize similarity using t-SNE and Bokeh.
Start ProjectBuying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's tought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. In this Project, you are going to create a content-based recommendation system where the 'content' will be the chemical components of cosmetics. Specifically, you will process ingredient lists for 1472 cosmetics on Sephora via word embedding, then visualize ingredient similarity using a machine learning method called t-SNE and an interactive visualization library called Bokeh.
This Project lets you apply the skills from Manipulating DataFrames with pandas, Chapter 1 of Dimensionality Reduction in Python, and Interactive Data Visualization with Bokeh. This Project also includes the concepts of natural language processing and word embedding, which you can learn about in Natural Language Processing Fundamentals in Python.
Graduate Research Assistant at Yonsei University
Jiwon is a graduate student majoring in Industrial Engineering at Yonsei University. Her core research area is in developing business strategies with a statistical approach. She has a passion for finding novel applications of machine learning. Outside of school, she is a lover of travel and books and is a healthy living advocate.
See MoreTechnology