Jurassic Scripts and Box Office Hits: Unveiling Data Stories Across Time

Everyone Can Learn Data Scholarship

📖 Background

The second "Everyone Can Learn Data" Scholarship from DataCamp is now open for entries.

The challenges below test your coding skills you gained from beginner courses on either Python, R, or SQL. Pair them with the help of AI and your creative thinking skills and win $5,000 for your future data science studies!

The scholarship is open to secondary and undergraduate students, and other students preparing for graduate-level studies (getting their Bachelor degree). Postgraduate students (PhDs) or graduated students (Master degree) cannot apply.

The challenge consist of two parts, make sure to complete both parts before submitting. Good luck!

1️⃣ Part 1 (Python) - Dinosaur data 🦕

📖 Background

You're applying for a summer internship at a national museum for natural history. The museum recently created a database containing all dinosaur records of past field campaigns. Your job is to dive into the fossil records to find some interesting insights, and advise the museum on the quality of the data.

💾 The data

You have access to a real dataset containing dinosaur records from the Paleobiology Database (source):

Column name	Description
occurence_no	The original occurrence number from the Paleobiology Database.
name	The accepted name of the dinosaur (usually the genus name, or the name of the footprint/egg fossil).
diet	The main diet (omnivorous, carnivorous, herbivorous).
type	The dinosaur type (small theropod, large theropod, sauropod, ornithopod, ceratopsian, armored dinosaur).
length_m	The maximum length, from head to tail, in meters.
max_ma	The age in which the first fossil records of the dinosaur where found, in million years.
min_ma	The age in which the last fossil records of the dinosaur where found, in million years.
region	The current region where the fossil record was found.
lng	The longitude where the fossil record was found.
lat	The latitude where the fossil record was found.
class	The taxonomical class of the dinosaur (Saurischia or Ornithischia).
family	The taxonomical family of the dinosaur (if known).

The data was enriched with data from Wikipedia.

💪 Challenge I

Help your colleagues at the museum to gain insights on the fossil record data. Include:

How many different dinosaur names are present in the data?
Which was the largest dinosaur? What about missing data in the dataset?
What dinosaur type has the most occurrences in this dataset? Create a visualization (table, bar chart, or equivalent) to display the number of dinosaurs per type. Use the AI assistant to tweak your visualization (colors, labels, title...).
Did dinosaurs get bigger over time? Show the relation between the dinosaur length and their age to illustrate this.
Use the AI assitant to create an interactive map showing each record.
Any other insights you found during your analysis?

Preperation

# Importing required packages for data manipulation and visualization
import folium # For interactive maps
import pandas as pd # For data manipulation
import numpy as np # For numerical operations
import seaborn as sns # For statistical data visualization
import matplotlib.pyplot as plt # For plotting

# Load the dinosaur dataset from the CSV file
dinosaurs = pd.read_csv('data/dinosaurs.csv')

# Preview the first few rows of the dataframe to understand its structure and contents
dinosaurs.head()

# Removing duplicate values from the dataset
# Duplicate records can skew the analysis and lead to incorrect insights
dinosaurs = dinosaurs.drop_duplicates()

1. Counting Unique Dinosaur Names

One of the key aspects of our dataset is the diversity of dinosaur species it represents. By counting the unique dinosaur names, we can get a sense of this diversity.

# Counting the number of unique dinosaur names in the dataset
# This helps us understand the diversity of the dataset in terms of species
unique_dino_names = dinosaurs['name'].nunique()
unique_dino_names

There are 1042 different dinosaur names present in the data.

‌
‌
‌

Jurassic Scripts and Box Office Hits: Unveiling Data Stories Across Time

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Everyone Can Learn Data Scholarship

📖 Background

1️⃣ Part 1 (Python) - Dinosaur data 🦕

📖 Background

💾 The data

💪 Challenge I

Preperation

1. Counting Unique Dinosaur Names

Everyone Can Learn Data Scholarship