Skip to content
Exploring Pokemon Dataset in SQL
# Using python to read csv file and convert to dataframe for easier querying in SQL.
import pandas as pd
pokemon = pd.read_csv('./Pokemon.csv')
pokemonDescription of Columns:
- #: ID for each pokemon
- Name: Name of each pokemon
- Type_1: Each pokemon has a type, this determines weakness/resistance to attacks
- Type_2: Some pokemon are dual type and have 2
- Total: sum of all stats that come after this, a general guide to how strong a pokemon is
- HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
- Attack: the base modifier for normal attacks (eg. Scratch, Punch)
- Defense: the base damage resistance against normal attacks
- Sp_Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- Sp_Def: the base damage resistance against special attacks
- Speed: determines which pokemon attacks first each round
- Generation: which generation the Pokemon was released
- Legendary: True or False column for if the Pokemon is legendary or not
This dataset 'Pokemon with stats' is available on Kaggle and is licensed CC0: Public Domain.
DataFrameas
df6
variable
/* Count of Legendary and Non_Legendary Pokemon */
SELECT
CASE WHEN Legendary = 'False' THEN 'Common'
WHEN Legendary = 'True' THEN 'Legendary'
END AS Rarity,
COUNT(*)
FROM pokemon
GROUP BY Rarity;DataFrameas
df3
variable
/* Count of Pokemon that are Type 1 or Dual Type */
SELECT
CASE WHEN Type_1 IS NOT NULL AND Type_2 IS NULL THEN 'Type 1 Only'
WHEN Type_1 IS NOT NULL AND Type_2 IS NOT NULL THEN 'Dual Type' END AS Type,
COUNT(*) AS n
FROM pokemon
GROUP BY Type
ORDER BY n DESC;DataFrameas
df
variable
/* Count of Pokemon grouped by Type_1 (Not Dual Type) */
SELECT Type_1, COUNT(*) AS n
FROM pokemon
WHERE Type_2 IS NULL
GROUP BY Type_1
ORDER BY n DESC;DataFrameas
df1
variable
/* Count of Pokemon grouped by Dual Type (Type_1 AND Type_2) */
SELECT Type_1, Type_2, COUNT(*) AS n
FROM pokemon
WHERE Type_1 IS NOT NULL AND Type_2 IS NOT NULL
GROUP BY Type_1, Type_2
ORDER BY n DESC;
DataFrameas
df4
variable
/* List of the top 10 'strongest' Pokemon by Total */
SELECT *
FROM pokemon
ORDER BY Total DESC
LIMIT 10;DataFrameas
df5
variable
/* List of top 10 best Defense Pokemon */
SELECT *
FROM pokemon
ORDER BY DEFENSE DESC
LIMIT 10;