Skip to content
Dinosaurs: A Journey Through Time... / UPWARD SLOPE ON CINEMA
0
  • AI Chat
  • Code
  • Report
  • Dinosaurs: A Journey Through Time...


    The story of dinosaurs is a captivating saga of survival, adaptation, and extinction that spans millions of years. By delving into various aspects of their existence, from size evolution to diet and extinction patterns, we can paint a vivid picture of these magnificent creatures' lives.

    # Import the pandas and numpy packages
    import pandas as pd
    import numpy as np
    # Load the data
    dinosaurs = pd.read_csv('data/dinosaurs.csv')
    # Preview the dataframe
    dinosaurs

    Total number of unique dinosaurs

    • There are 1042 number of unique dinosaurs based on name category

    Dinosaurs in total

    • There are 4951 dinosaurs in the fossil record
    # Unique dinosaurs
    unique_name_count = dinosaurs['name'].nunique()
    print(f"Total count of unique dinosaurs is: {unique_name_count}")
    
    # All dinosaurs in the fossil record
    name_count = dinosaurs['name'].count()
    print(f"Total count of all dinosaurs in the fossil record is: {name_count}")

    The Age of Giants: Did Dinosaurs Get Bigger Over Time?

    Dinosaurs roamed the Earth for over 160 million years, evolving into a myriad of forms and sizes. One intriguing question is whether these ancient reptiles grew larger as time progressed.

    Our analysis reveals that, indeed, dinosaurs tended to increase in size over time. This trend is illustrated by a scatter plot of dinosaur lengths against their ages, showcasing a gradual growth in size, peaking with colossal herbivores like the Argentinosaurus and the Titanosaurs. This progression suggests that evolutionary pressures favored larger body sizes, possibly for reasons such as improved defense mechanisms, enhanced foraging capabilities, and greater reproductive success.

    It good to note that our fossil record king is named Supersaurus. He is a vegetarian with length of 35.0 meters.

    # Largest Dinosaur
    largest = dinosaurs.iloc[dinosaurs['length_m'].idxmax()]
    print(f"Largest Dinosaur in the  dataset:\n{largest} \n\n")
    
    # missing values in the data set
    missing_val = dinosaurs.isnull().sum()
    print(f"Missing Values in the Dataset Column:\n{missing_val}\n\n")
    
    # Filling missing length_m
    missing_len = dinosaurs['length_m'].fillna(dinosaurs ['length_m'].mean())
    
    # Rechecking for the largest dinosaur
    largest_after_fillup = dinosaurs.iloc[missing_len.idxmax()]
    print(f"Rechecking for the largest dinosaur after fill-up:\n{largest_after_fillup}")

    Did the dinosaurs get bigger overtime?

    The scatter plot shows the relationship between the average age of dinosaurs and their size. From the plot, it appears that dinosaur sizes are spread across different ages, but it's difficult to draw a clear conclusion from the scatter plot alone without statistical analysis.


    Linear Regression Plot

    The linear regression plot shows a trend line that indicates the relationship between dinosaur age and size. The positive slope of the trend line suggests that, on average, dinosaurs tended to get bigger over time.

    Based on the linear regression plot, there is evidence to suggest that dinosaurs generally increased in size over time. The upward slope of the trend line indicates a positive correlation between the age of dinosaurs and their size, meaning that later dinosaurs tended to be larger.


    Correlation Coefficient

    The correlation coefficient will provide a numerical value indicating the strength and direction of the relationship between dinosaur age and size:

    • A value closer to 1 indicates a strong positive correlation (dinosaurs got bigger over time).
    • A value closer to -1 indicates a strong negative correlation (dinosaurs got smaller over time).
    • A value around 0 indicates no correlation.

    Correlation coefficient between dinosaur age and size: 0.3086316987796539 which is closer to 1

    import matplotlib.pyplot as plt
    from sklearn.linear_model import LinearRegression
    
    # Dinosaur type occurrences
    occurs = dinosaurs["type"]. value_counts()
    print(occurs)
    
    # Plot data
    plt.figure(figsize=(10,6))
    plt.plot(occurs.index.tolist(), occurs.values.tolist())
    
    # Plot title
    plt.title("Dinosaur Types with their Occurrences", fontweight=600, fontsize=15)
    # x-axis label
    plt.xlabel("Dinosaur types", fontsize=14)
    # y-axis label
    plt.ylabel("Occurrences", fontsize=14)
    
    # Display plot
    plt.show()
    plt.clf()
    # Handle missing values
    dinosaurs = dinosaurs.dropna(subset=['length_m', 'max_ma', 'min_ma', 'name'])
    
    # average height over time
    dinosaurs["avg_ma"] = (dinosaurs["max_ma"] + dinosaurs["min_ma"]) / 2
    
    # sort dataframe by avg_ma
    dinosaurs = dinosaurs.sort_values(by='avg_ma')
    
    # Colours for plot
    colors = plt.cm.get_cmap('viridis', len(dinosaurs))
    # plot data
    plt.figure(figsize=(12,8))
    plt.scatter(dinosaurs['avg_ma'], dinosaurs['length_m'], color=[colors(i) for i in range(len(dinosaurs))], s=100)
    
    # Titles and labels
    plt.title("Dinosaurs size over time", fontweight=600, fontsize=15)
    plt.xlabel("Average age (in millions of years)", fontsize=14)
    plt.ylabel("Dinosaur's size(length in meter)", fontsize=14)
    
    plt.grid(True)
    plt.show()
    # Linear regression
    x = dinosaurs[['avg_ma']]
    y = dinosaurs['length_m']
    model = LinearRegression()
    model.fit(x, y)
    trendline = model.predict(x)
    
    # Plot trend line
    plt.figure(figsize=(12, 8))
    plt.plot(dinosaurs['avg_ma'], trendline, color='red', linewidth=2)
    
    # plot labeling
    plt.title("Linear Regression on Dinosaurs size over time", fontweight=600)
    plt.xlabel("Age (In million years)")
    plt.ylabel("Length in meters")
    # Calculate correlation coefficient
    correlation = dinosaurs['avg_ma'].corr(dinosaurs['length_m'])
    print(f"Correlation coefficient between dinosaur age and size: {correlation}")

    Other insight


    • Extinction pattern
    • Diet and size correlation

    The Dance of Extinction: Patterns Over Time

    The extinction of dinosaurs is a complex phenomenon with multiple contributing factors. By examining extinction patterns over time, I uncover periods of significant species loss, shedding light on the dynamic and often tumultuous history of these creatures.

    A notable spike in extinctions occurred around 75 million years ago, marking a period of significant environmental upheaval.

    Another peak is observed around 150 million years ago, indicating a possible major extinction event during this time.

    Other periods also show smaller peaks, indicating extinction events were not uniform over time.

    These extinction patterns suggest that dinosaurs faced several catastrophic events, possibly including volcanic activity, climate change, and asteroid impacts, that periodically reshaped the Earth's biosphere.

    import seaborn as sns
    
    # Plot the distribution of last appearances (min_ma) over time
    plt.figure(figsize=(12, 8))
    plt.hist(dinosaurs['min_ma'], bins=30)
    
    # Customize the plot
    plt.xlabel('Last Appearance Age (Million Years)')
    plt.ylabel('Number of Species')
    plt.title('Species Extinction Patterns Over Time')
    plt.grid(True)
    plt.show()
    plt.clf()