Dinosaurs: A Journey Through Time...
The story of dinosaurs is a captivating saga of survival, adaptation, and extinction that spans millions of years. By delving into various aspects of their existence, from size evolution to diet and extinction patterns, we can paint a vivid picture of these magnificent creatures' lives.
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
dinosaurs = pd.read_csv('data/dinosaurs.csv')
# Preview the dataframe
dinosaurs
Total number of unique dinosaurs
- There are 1042 number of unique dinosaurs based on name category
Dinosaurs in total
- There are 4951 dinosaurs in the fossil record
# Unique dinosaurs
unique_name_count = dinosaurs['name'].nunique()
print(f"Total count of unique dinosaurs is: {unique_name_count}")
# All dinosaurs in the fossil record
name_count = dinosaurs['name'].count()
print(f"Total count of all dinosaurs in the fossil record is: {name_count}")
The Age of Giants: Did Dinosaurs Get Bigger Over Time?
Dinosaurs roamed the Earth for over 160 million years, evolving into a myriad of forms and sizes. One intriguing question is whether these ancient reptiles grew larger as time progressed.
Our analysis reveals that, indeed, dinosaurs tended to increase in size over time. This trend is illustrated by a scatter plot of dinosaur lengths against their ages, showcasing a gradual growth in size, peaking with colossal herbivores like the Argentinosaurus and the Titanosaurs. This progression suggests that evolutionary pressures favored larger body sizes, possibly for reasons such as improved defense mechanisms, enhanced foraging capabilities, and greater reproductive success.
It good to note that our fossil record king is named Supersaurus. He is a vegetarian with length of 35.0 meters.
# Largest Dinosaur
largest = dinosaurs.iloc[dinosaurs['length_m'].idxmax()]
print(f"Largest Dinosaur in the dataset:\n{largest} \n\n")
# missing values in the data set
missing_val = dinosaurs.isnull().sum()
print(f"Missing Values in the Dataset Column:\n{missing_val}\n\n")
# Filling missing length_m
missing_len = dinosaurs['length_m'].fillna(dinosaurs ['length_m'].mean())
# Rechecking for the largest dinosaur
largest_after_fillup = dinosaurs.iloc[missing_len.idxmax()]
print(f"Rechecking for the largest dinosaur after fill-up:\n{largest_after_fillup}")
Did the dinosaurs get bigger overtime?
The scatter plot shows the relationship between the average age of dinosaurs and their size. From the plot, it appears that dinosaur sizes are spread across different ages, but it's difficult to draw a clear conclusion from the scatter plot alone without statistical analysis.
Linear Regression Plot
The linear regression plot shows a trend line that indicates the relationship between dinosaur age and size. The positive slope of the trend line suggests that, on average, dinosaurs tended to get bigger over time.
Based on the linear regression plot, there is evidence to suggest that dinosaurs generally increased in size over time. The upward slope of the trend line indicates a positive correlation between the age of dinosaurs and their size, meaning that later dinosaurs tended to be larger.
Correlation Coefficient
The correlation coefficient will provide a numerical value indicating the strength and direction of the relationship between dinosaur age and size:
- A value closer to 1 indicates a strong positive correlation (dinosaurs got bigger over time).
- A value closer to -1 indicates a strong negative correlation (dinosaurs got smaller over time).
- A value around 0 indicates no correlation.
Correlation coefficient between dinosaur age and size: 0.3086316987796539 which is closer to 1
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Dinosaur type occurrences
occurs = dinosaurs["type"]. value_counts()
print(occurs)
# Plot data
plt.figure(figsize=(10,6))
plt.plot(occurs.index.tolist(), occurs.values.tolist())
# Plot title
plt.title("Dinosaur Types with their Occurrences", fontweight=600, fontsize=15)
# x-axis label
plt.xlabel("Dinosaur types", fontsize=14)
# y-axis label
plt.ylabel("Occurrences", fontsize=14)
# Display plot
plt.show()
plt.clf()
# Handle missing values
dinosaurs = dinosaurs.dropna(subset=['length_m', 'max_ma', 'min_ma', 'name'])
# average height over time
dinosaurs["avg_ma"] = (dinosaurs["max_ma"] + dinosaurs["min_ma"]) / 2
# sort dataframe by avg_ma
dinosaurs = dinosaurs.sort_values(by='avg_ma')
# Colours for plot
colors = plt.cm.get_cmap('viridis', len(dinosaurs))
# plot data
plt.figure(figsize=(12,8))
plt.scatter(dinosaurs['avg_ma'], dinosaurs['length_m'], color=[colors(i) for i in range(len(dinosaurs))], s=100)
# Titles and labels
plt.title("Dinosaurs size over time", fontweight=600, fontsize=15)
plt.xlabel("Average age (in millions of years)", fontsize=14)
plt.ylabel("Dinosaur's size(length in meter)", fontsize=14)
plt.grid(True)
plt.show()
# Linear regression
x = dinosaurs[['avg_ma']]
y = dinosaurs['length_m']
model = LinearRegression()
model.fit(x, y)
trendline = model.predict(x)
# Plot trend line
plt.figure(figsize=(12, 8))
plt.plot(dinosaurs['avg_ma'], trendline, color='red', linewidth=2)
# plot labeling
plt.title("Linear Regression on Dinosaurs size over time", fontweight=600)
plt.xlabel("Age (In million years)")
plt.ylabel("Length in meters")
# Calculate correlation coefficient
correlation = dinosaurs['avg_ma'].corr(dinosaurs['length_m'])
print(f"Correlation coefficient between dinosaur age and size: {correlation}")
Other insight
- Extinction pattern
- Diet and size correlation
The Dance of Extinction: Patterns Over Time
The extinction of dinosaurs is a complex phenomenon with multiple contributing factors. By examining extinction patterns over time, I uncover periods of significant species loss, shedding light on the dynamic and often tumultuous history of these creatures.
A notable spike in extinctions occurred around 75 million years ago, marking a period of significant environmental upheaval.
Another peak is observed around 150 million years ago, indicating a possible major extinction event during this time.
Other periods also show smaller peaks, indicating extinction events were not uniform over time.
These extinction patterns suggest that dinosaurs faced several catastrophic events, possibly including volcanic activity, climate change, and asteroid impacts, that periodically reshaped the Earth's biosphere.
import seaborn as sns
# Plot the distribution of last appearances (min_ma) over time
plt.figure(figsize=(12, 8))
plt.hist(dinosaurs['min_ma'], bins=30)
# Customize the plot
plt.xlabel('Last Appearance Age (Million Years)')
plt.ylabel('Number of Species')
plt.title('Species Extinction Patterns Over Time')
plt.grid(True)
plt.show()
plt.clf()