Earthquake Data Analysis: A Comprehensive Statistical Investigation of Global Seismic Events

🌍 Earthquake Data Analysis: A Comprehensive Statistical Investigation of Global Seismic Events

📊 Project Overview

This analysis explores global seismic activity using data from the USGS (United States Geological Survey) Earthquake Catalog. Through comprehensive data wrangling, statistical analysis, and visualization techniques, my aim is to uncover patterns and relationships in earthquake occurrences worldwide.

🎯 Objectives

Investigate the spatial and temporal distribution of earthquakes
Analyze relationships between earthquake characteristics (magnitude, depth, location)
Identify potential patterns in seismic activity
Assess regional variations in earthquake occurrences
Develop statistical insights into global seismic patterns

📑 Dataset Description

The dataset contains detailed information about earthquakes, including:

Event location (longitude, latitude)
Magnitude and depth
Temporal information
Additional seismic measurements
Regional metadata

🔍 Methodological Approach

My analysis employs a four-stage methodology:

1️⃣ Data Preprocessing and Wrangling

Cleaning and structuring raw seismic data
Handling missing values
Creating derived features for enhanced analysis
Standardizing measurements across observations

2️⃣ Statistical Analysis

Descriptive statistics of earthquake characteristics
Hypothesis testing for regional differences
Correlation analysis between seismic parameters
Bootstrap sampling for robust statistical inference

3️⃣ Visual Analytics

Geographic distribution mapping
Temporal trend visualization
Relationship plots between key variables
Interactive dashboards for exploration

4️⃣ Insight Generation

Pattern identification in seismic activity
Regional risk assessment
Temporal trend analysis
Correlation discovery between variables

💻 Technical Implementation

This analysis utilizes R programming language with key packages:

dplyr for data manipulation
ggplot2 for visualization
tidyr for data cleaning
stats for statistical analysis

🎁 Expected Outcomes

This analysis aims to provide:

Clear understanding of global earthquake patterns
Statistical validation of seismic relationships
Visual representation of complex earthquake data
Actionable insights for seismic risk assessment

Note: The following sections will detail my analysis, findings, and conclusions, supported by statistical evidence and visual representations.

# Load required libraries
library(jsonlite)
library(tidyr)
library(dplyr)


# Get earthquake data from the URL
url <- "https://earthquake.usgs.gov/fdsnws/event/1/query.geojson?starttime=2024-11-08%2000:00:00&endtime=2024-12-20%2023:59:59&minmagnitude=-10&maxmagnitude=10&orderby=time"


# Fetch and parse JSON data
data <- fromJSON(url)

# Extract and normalize the features data with all columns
earthquakes <- data$features %>%
  as.data.frame() %>%
  	unnest(cols = c(properties, geometry), names_sep = "_")

# View the first few rows
head(earthquakes)



# View all data type
split(names(earthquakes), sapply(earthquakes, class))

Initial Data Collection and Processing: USGS Earthquake Dataset

In the initial phase of my earthquake analysis project, I focused on collecting and processing data from the United States Geological Survey (USGS) Earthquake Catalog. The USGS provides a comprehensive API that allows access to detailed seismic event data worldwide.

My data collection process began with importing essential R libraries: jsonlite for JSON data handling, tidyr for data cleaning, and dplyr for data manipulation. These libraries form the backbone of my data processing pipeline, enabling efficient handling of complex nested data structures typical in API responses.

The data collection targeted earthquakes recorded between "November 8, 2024, and December 20, 2024." I intentionally set a broad magnitude range "(-10 to 10)" to ensure I captured all seismic events during this period. The API request was structured to return results ordered chronologically, facilitating temporal analysis of seismic activities.

The transformation of raw JSON data into a usable format involved several critical steps. First, I parsed the JSON response using fromJSON(), converting the nested structure into an R-compatible format. I then used unnest() to flatten the hierarchical data, particularly the 'properties' and 'geometry' nested objects, into a single, comprehensive dataframe. This transformation maintained data integrity while making it more accessible for analysis.

The resulting dataset, stored in the "'earthquakes'" dataframe, contains various data types:

Character data including location descriptions and event identifiers
Numeric data such as magnitude measurements and coordinates
Integer data for discrete measurements
Logical data for binary indicators
List data for complex geometric information

This initial processing sets the foundation for my subsequent analysis, ensuring my data is properly structured and ready for statistical investigation and visualization. The clean, normalized format will facilitate efficient computation and accurate insights into global seismic patterns.

Looking ahead, this processed dataset will enable me to:

Analyze spatial distributions of earthquakes
Study magnitude patterns across regions
Investigate temporal trends
Examine relationships between various seismic parameters

By establishing this solid data foundation, I can proceed confidently with my comprehensive analysis of global seismic activities.

library(stringr)
library(dplyr)
library(tidyr)  

# DATA CLEANING

# CREATE FOLLOWING COLUMNS 'Event Longitude (Deg)', 'Event Latitude (Deg)', 'Event Depth (km)', FROM "geometry_coordinates" COLUMNS

earthquakes <- earthquakes %>%
  mutate(
    `Event Longitude (Deg)` = sapply(geometry_coordinates, `[`, 1),
    `Event Latitude (Deg)` = sapply(geometry_coordinates, `[`, 2),
    `Event Depth (km)` = sapply(geometry_coordinates, `[`, 3)
  )

# CONVERTING UNIX TIME (properties_time, properties_update) TO STANDARD TIME
earthquakes <- earthquakes %>%
  mutate(
    time_readable = as.POSIXct(properties_time / 1000, origin = "1970-01-01", tz = "UTC"),
    update_readable = as.POSIXct(properties_updated / 1000, origin = "1970-01-01", tz = "UTC")
  )

# EXTRACTING CITIES AND STATE FROM (properties_place) TO CREATE IT COLUMN & CREATING A COUNTRY COLUMN
earthquakes <- earthquakes %>%
  mutate(
    state = str_trim(str_extract(properties_place, "[^,]+$")),
    city = str_extract(properties_place, "(?<=of ).*?(?=,)")
  )

# Get the Unique value in State to create a country column
#earthquakes %>% 
  #count(state, sort = TRUE)

# Create country column
earthquakes <- earthquakes %>%
  mutate(
    country = case_when(
      # United States states and territories
      state %in% c("CA", "Alaska", "Hawaii", "Nevada", "Texas", "Montana", "Oklahoma", 
                   "Utah", "Washington", "Idaho", "California", "Wyoming", "Oregon",
                   "Tennessee", "Arizona", "Arkansas", "New Jersey", "Kansas", "Maine",
                   "South Carolina", "Connecticut", "NV", "Colorado", "Alabama", "Indiana",
                   "Massachusetts", "Mississippi","Missouri", "Nebraska", "New York","New Mexico", "Ohio", 
                   "Virginia", "western Texas", "Louisiana", "Kentucky", "Nevada Earthquake") ~ "USA",
      state %in% c("Puerto Rico", "U.S. Virgin Islands", "Guam", 
                   "Northern Mariana Islands") ~ "USA Territory",
      # Single state = country cases
      state %in% c("Indonesia", "Japan", "Chile", "Papua New Guinea", "Philippines",
                   "Canada", "Russia", "China", "Mexico", "Vanuatu", "Argentina",
                   "Iran", "Afghanistan", "New Zealand", "Peru", "India", "Turkey",
                   "Greece", "Guatemala", "Ecuador", "Cuba", "Myanmar", "Taiwan",
                   "Panama", "Italy", "Iraq", "Mongolia", "Honduras", "Iceland",
                   "Nicaragua", "Pakistan", "Portugal", "Venezuela", "Azerbaijan",
                   "Bolivia", "Costa Rica", "Haiti", "Kazakhstan", "Kosovo", 
                   "Kyrgyzstan", "Saudi Arabia", "Somalia", "Tanzania", "Uganda",
                   "Yemen") ~ state,
      # Manual state to country
    grepl("Japan region", state) ~ "Japan",
    grepl("Fiji", state) ~ "Fiji",
    grepl("Kermadec", state) ~ "New Zealand",
    grepl("Kuril", state) ~ "Russia",
    grepl("Easter Island", state) ~ "Chile",
    grepl("Mariana", state) ~ "USA Territory",
    grepl("Solomon Islands", state) ~ "Solomon Islands",
    grepl("Dominican Republic", state) ~ "Dominican Republic",
    grepl("MX", state) ~ "Mexico",
    grepl("California Earthquake", state) ~ "USA",
    grepl("Cyprus", state) ~ "Cyprus",
    grepl("Tajikistan", state) ~ "Tajikistan",
    grepl("Antigua and Barbuda", state) ~ "Antigua and Barbuda",
    grepl("Australia", state) ~ "Australia",
    grepl("Colombia", state) ~ "Colombia",
    grepl("El Salvador", state) ~ "El Salvador",
    grepl("Tonga", state) ~ "Tonga",
    grepl("Timor Leste", state) ~ "Timor Leste",
	grepl("Poland", state) ~ "Poland",
    grepl("Zimbabwe", state) ~ "Zimbabwe",
    grepl("Nepal", state) ~ "Nepal",
    grepl("Albania", state) ~ "Albania",
    grepl("Algeria", state) ~ "Algeria",
    grepl("south of Africa", state) ~ "south of Africa",
    grepl("Vietnam", state) ~ "Vietnam",
    grepl("Tunisia", state) ~ "Tunisia",
    grepl("Ecuador region", state) ~ "Ecuador region",
    grepl("Democratic Republic of the Congo", state) ~ "D.R.Congo",
    grepl("Russia region", state) ~ "Russia",
    grepl("west of Macquarie Island", state) ~ "Macquarie Island region",
    grepl("South Sandwich Islands region", state) ~ "South Sandwich Islands",
    grepl("southeast of the Loyalty Islands", state) ~ "Loyalty Islands",
    grepl("off the west coast of the South Island of New Zealand", state) ~ "New Zealand",
    grepl("north of Svalbard", state) ~ "Svalbard",
    grepl("Vanuatu region", state) ~ "Vanuatu",
    grepl("Wallis and Futuna", state) ~ "Wallis and Futuna",
    grepl("south of the Aleutian Islands", state) ~ "Aleutian Islands",
    grepl("western Xizang", state) ~ "China",
    grepl("Balleny Islands region", state) ~ "Balleny Islands",
    grepl("north of Franz Josef Land", state) ~ "Franz Josef Land",
    grepl("Owen Fracture Zone region", state) ~ "Owen Fracture Zone",
    grepl("New Caledonia", state) ~ "New Caledonia",
    grepl("southwest of Africa", state) ~ "southwest of Africa",
    grepl("west of Vancouver Island", state) ~ "Vancouver Island",
    grepl("Anguilla", state) ~ "Anguilla",
    grepl("off the coast of Washington", state) ~ "Washington, USA",
    grepl("off the west coast of northern Sumatra", state) ~ "Sumatra, Indonesia",
    grepl("Revilla Gigedo Islands region", state) ~ "Revilla Gigedo Islands",
    grepl("Bouvet Island region", state) ~ "Bouvet Island",
    grepl("Macquarie Island region", state) ~ "Macquarie Island",
    grepl("off the coast of Central America", state) ~ "Central America",
    grepl("Azores Islands region", state) ~ "Azores Islands",
    grepl("north of Ascension Island", state) ~ "Ascension Island",
    grepl("east of Severnaya Zemlya", state) ~ "Severnaya Zemlya",
      # More Complicating state with Sea, Ocean regions and ridges
    grepl("Ridge|Sea|Ocean|Rise", state) ~ "International Waters",
      # Default for unmatched cases
      TRUE ~ "Other"
    )
  )

# Default for Unmatched case in City Column
earthquakes <- earthquakes %>%
  mutate(
    city = coalesce(city, state)
  )

# CREATE HOURS OLD
earthquakes <- earthquakes %>%
  mutate(
    hours_old = round(as.numeric(difftime(Sys.time(), time_readable, units = "hours")), 1)
  )

# REPLACING NULL VALUES WITH IT COLUMN MEAN() FOR THE FOLLOWING COLUMNS

# 1. Properties_nst 
earthquakes <- earthquakes %>%
  mutate(
    properties_nst = case_when(
      is.na(properties_nst) ~ round(mean(properties_nst, na.rm = TRUE), 0),
      TRUE ~ as.numeric(properties_nst)
    )
  )

# 2. Properties_dmin
earthquakes <- earthquakes %>%
  mutate(
    properties_dmin = case_when(
      is.na(properties_dmin) ~ round(mean(properties_dmin, na.rm = TRUE), 4),
      TRUE ~ as.numeric(properties_dmin)
    )
  )

# 3. Properties_gap
earthquakes <- earthquakes %>%
  mutate(
    properties_gap = case_when(
      is.na(properties_gap) ~ round(mean(properties_gap, na.rm = TRUE), 2),
      TRUE ~ as.numeric(properties_gap)
    )
  )

# CREATING CONTINENT COLUMN USING THE LONGITUDE AND LATITUDE
earthquakes <- earthquakes %>%
  mutate(
    continent = case_when(
      `Event Longitude (Deg)` >= -168.75 & `Event Longitude (Deg)` <= -10.267 & 
        `Event Latitude (Deg)` >= 12.5748 & `Event Latitude (Deg)` <= 83.4 ~ "North America",
      `Event Longitude (Deg)` >= -81.31 & `Event Longitude (Deg)` <= -34.79 &
        `Event Latitude (Deg)` >= -55.98 & `Event Latitude (Deg)` <= 12.46 ~ "South America",
      `Event Longitude (Deg)` >= -24.25 & `Event Longitude (Deg)` <= 41.89 &
        `Event Latitude (Deg)` >= 34.59 & `Event Latitude (Deg)` <= 81.85 ~ "Europe",
      `Event Longitude (Deg)` >= 26.04 & `Event Longitude (Deg)` <= 169.65 &
        `Event Latitude (Deg)` >= 3.86 & `Event Latitude (Deg)` <= 77.71 ~ "Asia",
      `Event Longitude (Deg)` >= -17.53 & `Event Longitude (Deg)` <= 51.42 &
        `Event Latitude (Deg)` >= -34.83 & `Event Latitude (Deg)` <= 37.35 ~ "Africa",
      `Event Longitude (Deg)` >= 112.91 & `Event Longitude (Deg)` <= 179.99 &
        `Event Latitude (Deg)` >= -47.19 & `Event Latitude (Deg)` <= -3.68 ~ "Oceania",
      `Event Latitude (Deg)` <= -60 ~ "Antarctica",
	# Manual state to country
	grepl("USA", country) ~ "North America",
    grepl("Indonesia", country) ~ "Asia",
    grepl("Tonga", country) ~ "Oceania",
    grepl("International Waters", country) ~ "International Waters",
    grepl("New Zealand", country) ~ "Oceania",
    grepl("Macquarie Island region", country) ~ "Oceania",
    grepl("South Sandwich Islands", country) ~ "Oceania",
    grepl("Fiji", country) ~ "Oceania",
    grepl("Chile", country) ~ "South America",
    grepl("Panama", country) ~ "North America",
    grepl("Nicaragua", country) ~ "North America",
    grepl("Papua New Guinea", country) ~ "Oceania",
    grepl("Svalbard", country) ~ "Europe",
    grepl("south of Africa", country) ~ "Africa",
    grepl("Wallis and Futuna", country) ~ "Oceania",
    grepl("Aleutian Islands", country) ~ "North America",
    grepl("Franz Josef Land", country) ~ "Europe",
    grepl("southwest of Africa", country) ~ "Africa",
    grepl("Costa Rica", country) ~ "North America",
    grepl("Russia", country) ~ "Europe/Asia",
    grepl("Philippines", country) ~ "Asia",
    grepl("El Salvador", country) ~ "North America",
    grepl("Bouvet Island", country) ~ "Antarctica",
    grepl("Central America", country) ~ "North America",
    grepl("Ecuador region", country) ~ "South America",
    grepl("Severnaya Zemlya", country) ~ "Europe/Asia",
	grepl("Macquarie Island", country) ~ "Oceania",
      # Default for unmatched cases
      TRUE ~ "Other"
    )
  )

# RENAME NECESSARY COLUMNS
#earthquakes %>% 
  #count(properties_type, sort = TRUE)

earthquakes %>% 
	select(time_readable, city, state, country, continent) %>%
		slice(1:8)

Data Cleaning and Transformation Process: USGS Earthquake Dataset

In this crucial phase of my data cleaning process, I performed several essential transformations to make the earthquake data more analyzable and meaningful. Here's a detailed breakdown of my data cleaning steps:

1. Coordinate Extraction

I began by extracting geographic coordinates from the complex 'geometry_coordinates' list column, creating three distinct columns:

"- Event Longitude (Deg)"

"- Event Latitude (Deg)"

"- Event Depth (km)"

This separation allows for easier geographical analysis and mapping.

2. Time Standardization

I converted Unix timestamps into readable datetime format for two key columns:

Converting "properties_time to time_readable"
Converting "properties_updated to update_readable" This transformation makes temporal analysis more intuitive and accessible.

3. Location Data Processing

I extracted and standardized location information from the 'properties_place' field:

Separated city and state information
Created a new "'country'" column using a comprehensive case_when() logic
Handled special cases and regions (e.g., "Japan region", "Fiji islands")
Assigned ""International Waters"" for "oceanic" regions

4. Missing Data Treatment

I implemented mean imputation for several numeric columns to handle missing values:

"- properties_nst (number of seismic stations)"

"- properties_dmin (minimum distance to stations)"

"- properties_gap (azimuthal gap)"

Each imputation was rounded appropriately to maintain data precision.

5. Temporal Analysis Enhancement

Created a "'hours_old'" column to calculate the age of each earthquake event relative to the current time, facilitating recency analysis.

6. Geographic Classification

Added a 'continent' column based on longitude and latitude coordinates, using precise geographic boundaries to categorize each event into one of seven continents:

"- North America"

"- South America"

"- Europe"

"- Asia"

"- Africa"

"- Oceania"

"- Antarctica"

This comprehensive cleaning process transformed the raw USGS data into a structured, analysis-ready dataset, setting the stage for meaningful statistical analysis and visualization of global seismic patterns.

library(dplyr)
library(tidyr)
library(lubridate)

# Clean and transform data
earthquakes_clean <- earthquakes %>%
  # Categorize earthquakes
  mutate(
    magnitude_category = case_when(
      properties_mag < 2 ~ "Minor",
      properties_mag < 4 ~ "Light",
      properties_mag < 6 ~ "Moderate",
      TRUE ~ "Major"
    ),
    depth_category = case_when(
      `Event Depth (km)` < 70 ~ "Shallow",
      `Event Depth (km)` < 300 ~ "Intermediate",
      TRUE ~ "Deep"
    )
  ) %>%
  # Handle missing values
  mutate(across(where(is.numeric), ~ifelse(is.na(.), median(., na.rm = TRUE), .)))

earthquakes_clean %>% 
	select(magnitude_category, depth_category, update_readable) %>%
		slice(1:10)

Data Categorization and Classification: Earthquake Analysis

In this phase of my analysis, I focused on categorizing earthquake events based on their characteristics and ensuring data completeness. Here's a detailed explanation of my approach:

Magnitude Classification

I created a 'magnitude_category' column that categorizes earthquakes into four distinct levels:

Minor: Less than "2.0" on the Richter scale
Light: Between "2.0" and "3.9"
Moderate: Between "4.0" and "5.9"
Major: "6.0" and "above"

This classification aligns with standard seismological categories and helps in understanding the distribution of earthquake intensities.

Depth Classification

I established a 'depth_category' column based on standard seismological depth classifications:

Shallow: Less than "70 km deep"
Intermediate: Between "70 and 300 km"
Deep: Greater than "300 km"

This categorization is crucial for understanding the nature of seismic events, as depth often correlates with earthquake behavior and potential impact.

Data Completeness

To ensure comprehensive analysis capabilities, I handled missing values in numeric columns using median imputation. This approach:

Maintains data distribution characteristics
Provides a more robust alternative to mean imputation
Ensures all numeric fields are complete for analysis

# Load necessary libraries
library(dplyr)


# Summary statistics
summary_stats <- earthquakes_clean %>%
  group_by(magnitude_category) %>%
  summarize(
    count = n(),
    avg_depth = mean(`Event Depth (km)`, na.rm = TRUE),
    median_depth = median(`Event Depth (km)`, na.rm = TRUE),
    sd_depth = sd(`Event Depth (km)`, na.rm = TRUE)
  )

# Correlation analysis
correlation_matrix <- cor(select(earthquakes_clean, 
                               properties_mag, 
                               `Event Depth (km)`, 
                               properties_sig), 
                         use = "complete.obs")

# Hypothesis testing
# Test if depth differs by magnitude category
depth_test <- aov(`Event Depth (km)` ~ magnitude_category, data = earthquakes_clean)
summary(depth_test)

Statistical Analysis Results: Understanding Earthquake Patterns

Analysis of Variance (ANOVA) Interpretation

The ANOVA test results examining the relationship between earthquake depth and magnitude categories revealed significant findings:

Statistical Values:

Degrees of Freedom (Df):
- Magnitude Category: 3 (representing four categories: "Minor", "Light", "Moderate", "Major")
- Residuals: "15,641" (number of observations minus categories)
Sum of Squares:
- Between groups (Sum Sq for magnitude_category): "7,963,588"
- Within groups (Residuals Sum Sq): "37,284,463"
Mean Square:
- Between groups: "2,654,529"
- Within groups: "2,384"
F-value: "(1114)"
- This large F-value indicates substantial differences between groups

Statistical Significance:

The extremely small p-value (< 2e-16, indicated by '***') provides strong evidence that:

There are highly significant differences in earthquake depths across magnitude categories
The relationship between magnitude and depth is not random
The variation between categories is substantially larger than would be expected by chance

Practical Interpretation:

The high F-value "(1114)" indicates that:
- The magnitude categories explain a significant portion of the variation in earthquake depths
- There are clear, distinct patterns in how depth relates to magnitude
The extremely low p-value suggests:
- We can be very confident that these differences are real
- The relationship between magnitude and depth is systematic
- Different magnitude earthquakes tend to occur at different depths

Implications:

This analysis strongly supports the existence of a relationship between earthquake magnitude and depth
The findings can be valuable for:
- Understanding earthquake patterns
- Improving prediction models
- Assessing seismic risk
- Developing targeted monitoring strategies

# Auto install the viridis package since it's not Install on Notebook #
if (!requireNamespace("viridis", quietly = TRUE)) {
  install.packages("viridis")
}

library(ggplot2)
library(viridis)

# First, create the cleaned dataset with depth categories
earthquakes_clean <- earthquakes %>%
  mutate(
    depth_category = case_when(
      `Event Depth (km)` < 70 ~ "Shallow",
      `Event Depth (km)` < 300 ~ "Intermediate",
      TRUE ~ "Deep"
    )
  )

# Plot 1: Earthquake distribution map
ggplot(earthquakes_clean, 
       aes(x = `Event Longitude (Deg)`, 
           y = `Event Latitude (Deg)`)) +
  geom_point(aes(color = properties_mag, 
                 size = properties_mag,
                 alpha = 0.6)) +
  scale_color_viridis() +
  theme_minimal() +
  labs(title = "Global Earthquake Distribution",
       subtitle = "Size and color indicate magnitude",
       x = "Longitude",
       y = "Latitude") +
  theme(legend.position = "bottom")

# Plot 2: Magnitude distribution by depth
ggplot(earthquakes_clean, 
       aes(x = properties_mag, 
           fill = depth_category)) +
  geom_density(alpha = 0.5) +
  facet_wrap(~depth_category) +
  theme_minimal() +
  labs(title = "Earthquake Magnitude Distribution by Depth",
       x = "Magnitude",
       y = "Density")

Visual Analysis: Global Earthquake Patterns and Distribution

I've created two key visualizations that reveal important patterns in the global seismic activity data:

Global Geographic Distribution (Image 1)

This scatter plot visualizes the spatial distribution of earthquakes:

Geographic Patterns

Clear concentration along tectonic plate boundaries
Notable clusters in specific regions:
- Pacific Ring of Fire "(high density around longitude 100 to 180)"
- Mid-Atlantic Ridge "(around longitude -30)"
- Mediterranean region "(around longitude 0-30)"

Magnitude Representation

Point size and color intensity correspond to earthquake magnitude
Darker blue/larger points indicate higher magnitude events
Lighter green/smaller points show lower magnitude events

Spatial Coverage

Most activity between latitudes "-60 and 80 degrees"
Clear patterns following major fault lines
Varying density of events across different regions

Magnitude Distribution by Depth Categories (Image 2)

This density plot reveals distinct patterns across three depth categories:

Deep Earthquakes

Shows a sharp, concentrated peak around magnitude "4.5-5.0"
Very narrow distribution indicating consistent magnitude patterns
Few events at lower or higher magnitudes

Intermediate Earthquakes

Displays a bimodal distribution with peaks around:
- First peak at magnitude "2.5"
- Second peak at magnitude "4.0"
Broader spread suggesting more variable magnitude range

Shallow Earthquakes

Shows a more normal distribution centered around magnitude "1.5-2.0"
Has a longer right tail indicating occasional higher magnitude events
Highest variety in magnitude range

# Bootstrap analysis of mean magnitude
set.seed(123)
bootstrap_samples <- replicate(1000, {
  sample_data <- earthquakes_clean %>%
    sample_n(size = nrow(earthquakes_clean), replace = TRUE)
  mean(sample_data$properties_mag)
})

# Calculate confidence interval
ci <- quantile(bootstrap_samples, c(0.025, 0.975))

# Visualize bootstrap distribution
ggplot(data.frame(magnitude = bootstrap_samples), aes(x = magnitude)) +
  geom_histogram(bins = 30, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = ci, linetype = "dashed", color = "red") +
  theme_minimal() +
  labs(title = "Bootstrap Distribution of Mean Magnitude",
       subtitle = "Red lines indicate 95% confidence interval",
       x = "Mean Magnitude",
       y = "Count")

Bootstrap Analysis: Estimating Mean Earthquake Magnitude

Visualization Analysis: Bootstrap Distribution

I've analyzed the bootstrap analysis histogram shown, which used "1,000" resamples to estimate the true mean magnitude of earthquakes in the dataset. The visualization reveals several key insights about the magnitude measurements.

Distribution Characteristics

Center: The distribution appears to be centered around "1.64" magnitude, with the highest frequency of bootstrap samples occurring at this point
Shape: The distribution shows a clear "bell-shaped (normal) pattern"
Spread: The data shows a relatively narrow spread, indicating good precision in the measurements
Symmetry: While generally symmetric, there's a slight right skew visible in the tail

Confidence Interval Analysis

The red dashed lines represent the "95%" confidence interval:

Lower bound: Approximately "1.62"
Upper bound: Approximately "1.66"
This interval suggests we can be "95%" confident that the true population mean magnitude falls within this range

Statistical Implications

Precision: The narrow width of the confidence interval (approximately "0.04" units) indicates high precision in our estimate of the mean magnitude
Reliability: The normal shape suggests the sampling distribution follows expected theoretical properties
Consistency: The smooth, well-formed distribution suggests a large underlying sample size
Statistical Power: The narrow confidence interval and large sample size provide strong statistical inference capabilities

Practical Significance

This analysis provides a reliable framework for:
- Understanding the typical magnitude of earthquakes in the region
- Making statistical inferences about seismic activity
- Providing a baseline for detecting unusual seismic events
- Supporting evidence-based disaster preparedness planning

# Temporal patterns
earthquakes_clean <- earthquakes_clean %>%
  mutate(hour_of_day = lubridate::hour(time_readable))

time_analysis <- earthquakes_clean %>%
  group_by(hour_of_day) %>%
  summarize(
    avg_magnitude = mean(properties_mag, na.rm = TRUE),
    count = n()
  )

# Visualize temporal patterns
ggplot(time_analysis, aes(x = hour_of_day, y = count)) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  labs(title = "Earthquake Frequency by Hour of Day",
       x = "Hour",
       y = "Number of Earthquakes")

Temporal Analysis: Earthquake Frequency Distribution Across Daily Hours

Analysis of Hourly Earthquake Patterns

I've analyzed the temporal distribution of earthquakes throughout the day, revealing several interesting patterns in seismic activity:

Key Features of the Time Series

Peak Activity

Highest frequency observed around "hour 9 (9:00 AM UTC)""
Notable spike of approximately "615" earthquakes
Significantly higher than the average hourly frequency

Secondary Peaks

Distinct peaks observed at:
- Hour 0 (midnight UTC): "~540 earthquakes"
- Hour 5: "~540 earthquakes"
- Hour 20: "~545 earthquakes"

Low Activity Periods

Minimum activity around:
- Hour 3: "~460 earthquakes"
- Hour 17: "~460 earthquakes"
Clear troughs in seismic activity during these times

Pattern Analysis

Cyclical Patterns

Evidence of regular fluctuations throughout the day
Approximately "4-6 hour" cycles between peaks and troughs
Suggests potential relationship with Earth's rotational effects

Distribution Characteristics

Range of approximately "150" earthquakes between highest and lowest frequencies
Generally maintained baseline of "480-520" earthquakes per hour
Irregular but noticeable rhythm in activity levels

Implications

Monitoring Considerations

Heightened monitoring may be beneficial during peak activity hours
Resource allocation could be optimized based on these patterns

Research Applications

Pattern could inform studies of Earth's crustal dynamics
Potential correlation with global human activity patterns
Basis for further investigation of temporal triggers

‌
‌
‌