Skip to content
0

Part 1️⃣: Unearthing Dinosaur Insights πŸ¦–

Source paleontologyworld.com

πŸš€ The key insights uncovered during the course of the dinosaur dataset analysis are:

  • Diversity of Species: The dataset contains 1042 unique dinosaur names, indicating a rich diversity of species.
  • Largest Dinosaurs: Supersaurus and Argentinosaurus are identified as the largest dinosaurs in the dataset, each measuring 35 meters in length.
  • Most Common Type: Ornithopods are the most common dinosaur type, with 811 occurrences, reflecting their success and adaptability.
  • Size Evolution: There is a trend showing that dinosaurs generally increased in size over time, particularly during the Jurassic and Cretaceous periods.
  • Diet and Size Correlation: A correlation exists between dinosaur size and diet, with herbivorous dinosaurs tending to be larger than carnivorous ones.
  • Geographical Distribution: An interactive map was created to show the geographical distribution of dinosaur fossils.
  • Variations Among Herbivores: Not all herbivorous dinosaurs were large; small herbivorous theropods were identified, suggesting size variation within herbivores possibly due to differences in digestive anatomy.

1.1 πŸ“– Background

You’re applying for a summer internship at a national museum for natural history. The museum recently created a database containing all dinosaur records from past field campaigns. Your job is to dive into the fossil records to find some interesting insights and advise the museum on the quality of the data

1.2 🎯 Objectives

The objectives of this research are to help my colleagues at the museum gain insights on the fossil record data. The specific questions I aim to answer include:

  1. How many different dinosaur names are present in the data?
  2. Which was the largest dinosaur? What about missing data in the dataset?
  3. What dinosaur type has the most occurrences in this dataset?
  4. Did dinosaurs get bigger over time? Show the relation between the dinosaur length and their age to illustrate this.
  5. Use the AI assistant to create an interactive map showing each record.
  6. Any other insights found during this analysis?

1.3 πŸ“š Introduction

The National Museum of Natural History has tasked me with delving into the fossil record of dinosaurs. My colleagues are eager to glean valuable insights from the data, and I am determined to deliver meaningful information. I will investigate the number of dinosaur names present in the data, identify the largest dinosaur, and address any missing data in the dataset. Additionally, I will analyze which types of dinosaurs have the highest occurrences. I'll also explore whether dinosaurs tend to grow larger over time and uncover any other insights during my analysis. Though this task may be challenging, I am committed to delivering valuable insights to my organization.

1.4 πŸ’Ύ Data Description

This analysis utilizes data sourced from the Paleobiology Database (source). The dataset provides a comprehensive look into the fossil records of dinosaurs, including details such as names, diets, types, lengths, ages, regions, and geographical coordinates.

Column nameDescription
occurence_noThe original occurrence number from the Paleobiology Database.
nameThe accepted name of the dinosaur (usually the genus name, or the name of the footprint/egg fossil).
dietThe main diet (omnivorous, carnivorous, herbivorous).
typeThe dinosaur type (small theropod, large theropod, sauropod, ornithopod, ceratopsian, armored dinosaur).
length_mThe maximum length, from head to tail, in meters.
max_maThe age in which the first fossil records of the dinosaur where found, in million years.
min_maThe age in which the last fossil records of the dinosaur where found, in million years.
regionThe current region where the fossil record was found.
lngThe longitude where the fossil record was found.
latThe latitude where the fossil record was found.
classThe taxonomical class of the dinosaur (Saurischia or Ornithischia).
familyThe taxonomical family of the dinosaur (if known).

1.5 πŸ” Exploratory Data Analysis

Initially, we import the required packages for data exploration. We then assess our data to gain a comprehensive understanding by examining missing values, duplicates, and key variables.

# Importing necessary libraries for data manipulation (Pandas, NumPy), and visualization (Matplotlib, Seaborn, WordCloud, Folium)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from wordcloud import WordCloud
from folium.plugins import MarkerCluster
# Load dinosaur dataset into 'dinosaurs' DataFrame
dinosaurs = pd.read_csv('data/dinosaurs.csv')
# Preview the first few rows of the dinosaur dataset
dinosaurs.head()
# Summary of the dinosaurs DataFrame
dinosaurs.info()
# Checking for duplicates in the dinosaurs dataset
duplicates = dinosaurs.duplicated()

# Counting the number of duplicate rows
num_duplicates = duplicates.sum()

# Displaying whether any duplicate rows exist
if num_duplicates == 0:
    print("There are no duplicate rows in the dataset.")
else:
    print(f"There are {num_duplicates} duplicate rows in the dataset.")
# Count missing values per column
missing_values_count = dinosaurs.isnull().sum()
missing_values_count
dinosaurs.describe()

1.6 🧠 Main Analysis

β€Œ
β€Œ
β€Œ