Skip to content
0

Everyone Can Learn Data Scholarship

Credits: Copilot AI. Dinosaurs watching Titanic Movie

Don't forget to upvote

1️⃣ Part 1 (Python) - Dinosaur data 🦕

from IPython.display import Image, display

# File names of the images
image_files = ["1. Top 5 dinosaurs.jpg", "2. Starting letters of dinosaurs.png", "3. Largest_Dinosaur.jpg", "4. Smallest_dinosaur.png", "5. Missing values.png" ]

# Display images
for file in image_files:
    display(Image(filename=file))

2️⃣ Part 2 (SQL) - Understanding movie data 🎥

from IPython.display import Image, display

# File names of the images
image_files = [
"6. Missing_values_cinema.png",
"7. Certification.png",
"8. Missing.png",
"9. cer.png",
"10. cer.png",
"11. country.png",
"12. cou.png",
"13. cou.png"
]

# Display images
for file in image_files:
    display(Image(filename=file))

1️⃣ Part 1 (Python) - Dinosaur data 🦕

📖 Background

You're applying for a summer internship at a national museum for natural history. The museum recently created a database containing all dinosaur records of past field campaigns. Your job is to dive into the fossil records to find some interesting insights, and advise the museum on the quality of the data.

💾 The data

You have access to a real dataset containing dinosaur records from the Paleobiology Database (source):

Column nameDescription
occurence_noThe original occurrence number from the Paleobiology Database.
nameThe accepted name of the dinosaur (usually the genus name, or the name of the footprint/egg fossil).
dietThe main diet (omnivorous, carnivorous, herbivorous).
typeThe dinosaur type (small theropod, large theropod, sauropod, ornithopod, ceratopsian, armored dinosaur).
length_mThe maximum length, from head to tail, in meters.
max_maThe age in which the first fossil records of the dinosaur where found, in million years.
min_maThe age in which the last fossil records of the dinosaur where found, in million years.
regionThe current region where the fossil record was found.
lngThe longitude where the fossil record was found.
latThe latitude where the fossil record was found.
classThe taxonomical class of the dinosaur (Saurischia or Ornithischia).
familyThe taxonomical family of the dinosaur (if known).

The data was enriched with data from Wikipedia.

💪 Challenge I

Help your colleagues at the museum to gain insights on the fossil record data. Include:

  1. How many different dinosaur names are present in the data?
  2. Which was the largest dinosaur? What about missing data in the dataset?
  3. What dinosaur type has the most occurrences in this dataset? Create a visualization (table, bar chart, or equivalent) to display the number of dinosaurs per type. Use the AI assistant to tweak your visualization (colors, labels, title...).
  4. Did dinosaurs get bigger over time? Show the relation between the dinosaur length and their age to illustrate this.
  5. Use the AI assitant to create an interactive map showing each record.
  6. Any other insights you found during your analysis?
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
dinosaurs = pd.read_csv('data/dinosaurs.csv')
# Preview the dataframe
dinosaurs

Unveiling Dinosaur Secrets 🦖

Get ready to explore the fascinating world of dinosaurs!

This notebook delves into a real dataset from the Paleobiology Database, revealing insights about these magnificent creatures. Before diving deeply into the analysis, let's take a moment to appreciate the wonders of dinosaurs.

We'll uncover insights about their size, distribution, types, and more, all from the clues left behind in fossilized remains.

Let's begin our journey through the dinosaur era!

# Apply unique to each column and print the number of unique values for each column
# Also, show all unique values for each column
for column in dinosaurs.columns:
    unique_values = dinosaurs[column].unique()
    print(f"Column: {column}")
    print(f"Number of Unique Values: {len(unique_values)}")
    print(f"Unique Values: {unique_values}\n")
Hidden output