Premium project

What Makes a Pokémon Legendary?

Use tree-based machine learning methods to identify the characteristics of legendary Pokémon.

Start Project
11 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

Not all Pokémon are created equal. Some are consigned to mediocrity, useless in battle until they reach their more evolved states. Others – like Zapdos, Articuno and Moltres – are so unique and powerful that they have officially been classified as legendary. But what exactly makes a Pokémon the stuff of legend? In this project, we will answer that question with the help of a dataset that includes the base stats, height, weight and type of 801 Pokémon from all seven generations. Using the random forest algorithm, we will predict Pokemon status based on these characteristics and rank their importance in determining whether a Pokemon is classified as legendary. Students should be familiar with the `tidyverse` suite of packages, particularly `ggplot2` for data visualization and `dplyr` for data manipulation. They should also have experience with classification problems and tree-based methods. This project uses a subset of [The Complete Pokemon Dataset](https://www.kaggle.com/rounakbanik/pokemon/home) published on Kaggle.

Project Tasks

  1. 1
    Introduction
  2. 2
    How many Pokémon are legendary?
  3. 3
    Legendary Pokémon by height and weight
  4. 4
    Legendary Pokémon by type
  5. 5
    Legendary Pokémon by fighter stats
  6. 6
    Create a training/test split
  7. 7
    Fit a decision tree
  8. 8
    Fit a random forest
  9. 9
    Assess model fit
  10. 10
    Analyze variable importance
  11. 11
    Conclusion
Technologies
R R
Topics
Data ManipulationData VisualizationMachine LearningImporting & Cleaning Data
Joshua Feldman Headshot

Joshua Feldman

Decision Scientist at Facebook
Joshua Feldman is a Decision Scientist at Facebook, where he uses data insights to help drive brand sentiment, user growth and engagement, and marketing effectiveness. He mainly codes in R and SQL, taking a specialist interest in causal inference, natural language processing and data visualization. He holds an MSc in quantitative research methodology from the London School of Economics.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA