Exploring the Kaggle Data Science Survey

Discover the top tools Kaggle participants use for data science and machine learning.

Project Description

When beginning a career in data science, one often wonders what programming tools and languages are being used in the industry, and what skills one should learn first. By exploring the 2017 Kaggle Data Science Survey results, you can learn about the tools used by 10,000+ people in the professional data science community.

Before starting this project, you should be comfortable manipulating data frames and have some experience working with the tidyverse packages dplyr, tidyr, and ggplot2.

This project uses a subset of the 2017 Kaggle Machine Learning and Data Science Survey dataset. If you want to know more about the tools and techniques Kaggle participants use, check out the full report of the Kaggle 2017 survey results.

Project Tasks

  • 1Welcome to the world of data science
  • 2Using multiple tools
  • 3Counting users of each tool
  • 4Plotting the most popular tools
  • 5The R vs Python debate
  • 6Plotting R vs Python users
  • 7Language recommendations
  • 8The most recommended language by the language used
  • 9The moral of the story
Amber Thomas

Journalist-Engineer at The Pudding

Amber Thomas is a journalist-engineer at The Pudding, an online collection of data-driven, visual essays. Before joining The Pudding, she was a marine biologist, collecting data on all things beneath the waves. Follow her on Twitter ( @ProQuesAsker) or on her personal website.

