🌍 Earthquake Data Analysis: A Comprehensive Statistical Investigation of Global Seismic Events
📊 Project Overview
This analysis explores global seismic activity using data from the USGS (United States Geological Survey) Earthquake Catalog. Through comprehensive data wrangling, statistical analysis, and visualization techniques, my aim is to uncover patterns and relationships in earthquake occurrences worldwide.
🎯 Objectives
- Investigate the spatial and temporal distribution of earthquakes
- Analyze relationships between earthquake characteristics (magnitude, depth, location)
- Identify potential patterns in seismic activity
- Assess regional variations in earthquake occurrences
- Develop statistical insights into global seismic patterns
📑 Dataset Description
The dataset contains detailed information about earthquakes, including:
- Event location (longitude, latitude)
- Magnitude and depth
- Temporal information
- Additional seismic measurements
- Regional metadata
🔍 Methodological Approach
My analysis employs a four-stage methodology:
1️⃣ Data Preprocessing and Wrangling
- Cleaning and structuring raw seismic data
- Handling missing values
- Creating derived features for enhanced analysis
- Standardizing measurements across observations
2️⃣ Statistical Analysis
- Descriptive statistics of earthquake characteristics
- Hypothesis testing for regional differences
- Correlation analysis between seismic parameters
- Bootstrap sampling for robust statistical inference
3️⃣ Visual Analytics
- Geographic distribution mapping
- Temporal trend visualization
- Relationship plots between key variables
- Interactive dashboards for exploration
4️⃣ Insight Generation
- Pattern identification in seismic activity
- Regional risk assessment
- Temporal trend analysis
- Correlation discovery between variables
💻 Technical Implementation
This analysis utilizes R programming language with key packages:
dplyr
for data manipulationggplot2
for visualizationtidyr
for data cleaningstats
for statistical analysis
🎁 Expected Outcomes
This analysis aims to provide:
- Clear understanding of global earthquake patterns
- Statistical validation of seismic relationships
- Visual representation of complex earthquake data
- Actionable insights for seismic risk assessment
Note: The following sections will detail my analysis, findings, and conclusions, supported by statistical evidence and visual representations.
# Load required libraries
library(jsonlite)
library(tidyr)
library(dplyr)
# Get earthquake data from the URL
url <- "https://earthquake.usgs.gov/fdsnws/event/1/query.geojson?starttime=2024-11-08%2000:00:00&endtime=2024-12-20%2023:59:59&minmagnitude=-10&maxmagnitude=10&orderby=time"
# Fetch and parse JSON data
data <- fromJSON(url)
# Extract and normalize the features data with all columns
earthquakes <- data$features %>%
as.data.frame() %>%
unnest(cols = c(properties, geometry), names_sep = "_")
# View the first few rows
head(earthquakes)
# View all data type
split(names(earthquakes), sapply(earthquakes, class))
Initial Data Collection and Processing: USGS Earthquake Dataset
In the initial phase of my earthquake analysis project, I focused on collecting and processing data from the United States Geological Survey (USGS) Earthquake Catalog. The USGS provides a comprehensive API that allows access to detailed seismic event data worldwide.
My data collection process began with importing essential R libraries: jsonlite for JSON data handling, tidyr for data cleaning, and dplyr for data manipulation. These libraries form the backbone of my data processing pipeline, enabling efficient handling of complex nested data structures typical in API responses.
The data collection targeted earthquakes recorded between "November 8, 2024, and December 20, 2024."
I intentionally set a broad magnitude range "(-10 to 10)"
to ensure I captured all seismic events during this period. The API request was structured to return results ordered chronologically, facilitating temporal analysis of seismic activities.
The transformation of raw JSON data into a usable format involved several critical steps. First, I parsed the JSON response using fromJSON(), converting the nested structure into an R-compatible format. I then used unnest() to flatten the hierarchical data, particularly the 'properties' and 'geometry' nested objects, into a single, comprehensive dataframe. This transformation maintained data integrity while making it more accessible for analysis.
The resulting dataset, stored in the "'earthquakes'"
dataframe, contains various data types:
- Character data including location descriptions and event identifiers
- Numeric data such as magnitude measurements and coordinates
- Integer data for discrete measurements
- Logical data for binary indicators
- List data for complex geometric information
This initial processing sets the foundation for my subsequent analysis, ensuring my data is properly structured and ready for statistical investigation and visualization. The clean, normalized format will facilitate efficient computation and accurate insights into global seismic patterns.
Looking ahead, this processed dataset will enable me to:
- Analyze spatial distributions of earthquakes
- Study magnitude patterns across regions
- Investigate temporal trends
- Examine relationships between various seismic parameters
By establishing this solid data foundation, I can proceed confidently with my comprehensive analysis of global seismic activities.
library(stringr)
library(dplyr)
library(tidyr)
# DATA CLEANING
# CREATE FOLLOWING COLUMNS 'Event Longitude (Deg)', 'Event Latitude (Deg)', 'Event Depth (km)', FROM "geometry_coordinates" COLUMNS
earthquakes <- earthquakes %>%
mutate(
`Event Longitude (Deg)` = sapply(geometry_coordinates, `[`, 1),
`Event Latitude (Deg)` = sapply(geometry_coordinates, `[`, 2),
`Event Depth (km)` = sapply(geometry_coordinates, `[`, 3)
)
# CONVERTING UNIX TIME (properties_time, properties_update) TO STANDARD TIME
earthquakes <- earthquakes %>%
mutate(
time_readable = as.POSIXct(properties_time / 1000, origin = "1970-01-01", tz = "UTC"),
update_readable = as.POSIXct(properties_updated / 1000, origin = "1970-01-01", tz = "UTC")
)
# EXTRACTING CITIES AND STATE FROM (properties_place) TO CREATE IT COLUMN & CREATING A COUNTRY COLUMN
earthquakes <- earthquakes %>%
mutate(
state = str_trim(str_extract(properties_place, "[^,]+$")),
city = str_extract(properties_place, "(?<=of ).*?(?=,)")
)
# Get the Unique value in State to create a country column
#earthquakes %>%
#count(state, sort = TRUE)
# Create country column
earthquakes <- earthquakes %>%
mutate(
country = case_when(
# United States states and territories
state %in% c("CA", "Alaska", "Hawaii", "Nevada", "Texas", "Montana", "Oklahoma",
"Utah", "Washington", "Idaho", "California", "Wyoming", "Oregon",
"Tennessee", "Arizona", "Arkansas", "New Jersey", "Kansas", "Maine",
"South Carolina", "Connecticut", "NV", "Colorado", "Alabama", "Indiana",
"Massachusetts", "Mississippi","Missouri", "Nebraska", "New York","New Mexico", "Ohio",
"Virginia", "western Texas", "Louisiana", "Kentucky", "Nevada Earthquake") ~ "USA",
state %in% c("Puerto Rico", "U.S. Virgin Islands", "Guam",
"Northern Mariana Islands") ~ "USA Territory",
# Single state = country cases
state %in% c("Indonesia", "Japan", "Chile", "Papua New Guinea", "Philippines",
"Canada", "Russia", "China", "Mexico", "Vanuatu", "Argentina",
"Iran", "Afghanistan", "New Zealand", "Peru", "India", "Turkey",
"Greece", "Guatemala", "Ecuador", "Cuba", "Myanmar", "Taiwan",
"Panama", "Italy", "Iraq", "Mongolia", "Honduras", "Iceland",
"Nicaragua", "Pakistan", "Portugal", "Venezuela", "Azerbaijan",
"Bolivia", "Costa Rica", "Haiti", "Kazakhstan", "Kosovo",
"Kyrgyzstan", "Saudi Arabia", "Somalia", "Tanzania", "Uganda",
"Yemen") ~ state,
# Manual state to country
grepl("Japan region", state) ~ "Japan",
grepl("Fiji", state) ~ "Fiji",
grepl("Kermadec", state) ~ "New Zealand",
grepl("Kuril", state) ~ "Russia",
grepl("Easter Island", state) ~ "Chile",
grepl("Mariana", state) ~ "USA Territory",
grepl("Solomon Islands", state) ~ "Solomon Islands",
grepl("Dominican Republic", state) ~ "Dominican Republic",
grepl("MX", state) ~ "Mexico",
grepl("California Earthquake", state) ~ "USA",
grepl("Cyprus", state) ~ "Cyprus",
grepl("Tajikistan", state) ~ "Tajikistan",
grepl("Antigua and Barbuda", state) ~ "Antigua and Barbuda",
grepl("Australia", state) ~ "Australia",
grepl("Colombia", state) ~ "Colombia",
grepl("El Salvador", state) ~ "El Salvador",
grepl("Tonga", state) ~ "Tonga",
grepl("Timor Leste", state) ~ "Timor Leste",
grepl("Poland", state) ~ "Poland",
grepl("Zimbabwe", state) ~ "Zimbabwe",
grepl("Nepal", state) ~ "Nepal",
grepl("Albania", state) ~ "Albania",
grepl("Algeria", state) ~ "Algeria",
grepl("south of Africa", state) ~ "south of Africa",
grepl("Vietnam", state) ~ "Vietnam",
grepl("Tunisia", state) ~ "Tunisia",
grepl("Ecuador region", state) ~ "Ecuador region",
grepl("Democratic Republic of the Congo", state) ~ "D.R.Congo",
grepl("Russia region", state) ~ "Russia",
grepl("west of Macquarie Island", state) ~ "Macquarie Island region",
grepl("South Sandwich Islands region", state) ~ "South Sandwich Islands",
grepl("southeast of the Loyalty Islands", state) ~ "Loyalty Islands",
grepl("off the west coast of the South Island of New Zealand", state) ~ "New Zealand",
grepl("north of Svalbard", state) ~ "Svalbard",
grepl("Vanuatu region", state) ~ "Vanuatu",
grepl("Wallis and Futuna", state) ~ "Wallis and Futuna",
grepl("south of the Aleutian Islands", state) ~ "Aleutian Islands",
grepl("western Xizang", state) ~ "China",
grepl("Balleny Islands region", state) ~ "Balleny Islands",
grepl("north of Franz Josef Land", state) ~ "Franz Josef Land",
grepl("Owen Fracture Zone region", state) ~ "Owen Fracture Zone",
grepl("New Caledonia", state) ~ "New Caledonia",
grepl("southwest of Africa", state) ~ "southwest of Africa",
grepl("west of Vancouver Island", state) ~ "Vancouver Island",
grepl("Anguilla", state) ~ "Anguilla",
grepl("off the coast of Washington", state) ~ "Washington, USA",
grepl("off the west coast of northern Sumatra", state) ~ "Sumatra, Indonesia",
grepl("Revilla Gigedo Islands region", state) ~ "Revilla Gigedo Islands",
grepl("Bouvet Island region", state) ~ "Bouvet Island",
grepl("Macquarie Island region", state) ~ "Macquarie Island",
grepl("off the coast of Central America", state) ~ "Central America",
grepl("Azores Islands region", state) ~ "Azores Islands",
grepl("north of Ascension Island", state) ~ "Ascension Island",
grepl("east of Severnaya Zemlya", state) ~ "Severnaya Zemlya",
# More Complicating state with Sea, Ocean regions and ridges
grepl("Ridge|Sea|Ocean|Rise", state) ~ "International Waters",
# Default for unmatched cases
TRUE ~ "Other"
)
)
# Default for Unmatched case in City Column
earthquakes <- earthquakes %>%
mutate(
city = coalesce(city, state)
)
# CREATE HOURS OLD
earthquakes <- earthquakes %>%
mutate(
hours_old = round(as.numeric(difftime(Sys.time(), time_readable, units = "hours")), 1)
)
# REPLACING NULL VALUES WITH IT COLUMN MEAN() FOR THE FOLLOWING COLUMNS
# 1. Properties_nst
earthquakes <- earthquakes %>%
mutate(
properties_nst = case_when(
is.na(properties_nst) ~ round(mean(properties_nst, na.rm = TRUE), 0),
TRUE ~ as.numeric(properties_nst)
)
)
# 2. Properties_dmin
earthquakes <- earthquakes %>%
mutate(
properties_dmin = case_when(
is.na(properties_dmin) ~ round(mean(properties_dmin, na.rm = TRUE), 4),
TRUE ~ as.numeric(properties_dmin)
)
)
# 3. Properties_gap
earthquakes <- earthquakes %>%
mutate(
properties_gap = case_when(
is.na(properties_gap) ~ round(mean(properties_gap, na.rm = TRUE), 2),
TRUE ~ as.numeric(properties_gap)
)
)
# CREATING CONTINENT COLUMN USING THE LONGITUDE AND LATITUDE
earthquakes <- earthquakes %>%
mutate(
continent = case_when(
`Event Longitude (Deg)` >= -168.75 & `Event Longitude (Deg)` <= -10.267 &
`Event Latitude (Deg)` >= 12.5748 & `Event Latitude (Deg)` <= 83.4 ~ "North America",
`Event Longitude (Deg)` >= -81.31 & `Event Longitude (Deg)` <= -34.79 &
`Event Latitude (Deg)` >= -55.98 & `Event Latitude (Deg)` <= 12.46 ~ "South America",
`Event Longitude (Deg)` >= -24.25 & `Event Longitude (Deg)` <= 41.89 &
`Event Latitude (Deg)` >= 34.59 & `Event Latitude (Deg)` <= 81.85 ~ "Europe",
`Event Longitude (Deg)` >= 26.04 & `Event Longitude (Deg)` <= 169.65 &
`Event Latitude (Deg)` >= 3.86 & `Event Latitude (Deg)` <= 77.71 ~ "Asia",
`Event Longitude (Deg)` >= -17.53 & `Event Longitude (Deg)` <= 51.42 &
`Event Latitude (Deg)` >= -34.83 & `Event Latitude (Deg)` <= 37.35 ~ "Africa",
`Event Longitude (Deg)` >= 112.91 & `Event Longitude (Deg)` <= 179.99 &
`Event Latitude (Deg)` >= -47.19 & `Event Latitude (Deg)` <= -3.68 ~ "Oceania",
`Event Latitude (Deg)` <= -60 ~ "Antarctica",
# Manual state to country
grepl("USA", country) ~ "North America",
grepl("Indonesia", country) ~ "Asia",
grepl("Tonga", country) ~ "Oceania",
grepl("International Waters", country) ~ "International Waters",
grepl("New Zealand", country) ~ "Oceania",
grepl("Macquarie Island region", country) ~ "Oceania",
grepl("South Sandwich Islands", country) ~ "Oceania",
grepl("Fiji", country) ~ "Oceania",
grepl("Chile", country) ~ "South America",
grepl("Panama", country) ~ "North America",
grepl("Nicaragua", country) ~ "North America",
grepl("Papua New Guinea", country) ~ "Oceania",
grepl("Svalbard", country) ~ "Europe",
grepl("south of Africa", country) ~ "Africa",
grepl("Wallis and Futuna", country) ~ "Oceania",
grepl("Aleutian Islands", country) ~ "North America",
grepl("Franz Josef Land", country) ~ "Europe",
grepl("southwest of Africa", country) ~ "Africa",
grepl("Costa Rica", country) ~ "North America",
grepl("Russia", country) ~ "Europe/Asia",
grepl("Philippines", country) ~ "Asia",
grepl("El Salvador", country) ~ "North America",
grepl("Bouvet Island", country) ~ "Antarctica",
grepl("Central America", country) ~ "North America",
grepl("Ecuador region", country) ~ "South America",
grepl("Severnaya Zemlya", country) ~ "Europe/Asia",
grepl("Macquarie Island", country) ~ "Oceania",
# Default for unmatched cases
TRUE ~ "Other"
)
)
# RENAME NECESSARY COLUMNS
#earthquakes %>%
#count(properties_type, sort = TRUE)
earthquakes %>%
select(time_readable, city, state, country, continent) %>%
slice(1:8)
Data Cleaning and Transformation Process: USGS Earthquake Dataset
In this crucial phase of my data cleaning process, I performed several essential transformations to make the earthquake data more analyzable and meaningful. Here's a detailed breakdown of my data cleaning steps:
1. Coordinate Extraction
I began by extracting geographic coordinates from the complex 'geometry_coordinates' list column, creating three distinct columns:
"- Event Longitude (Deg)"
"- Event Latitude (Deg)"
"- Event Depth (km)"
This separation allows for easier geographical analysis and mapping.
2. Time Standardization
I converted Unix timestamps into readable datetime format for two key columns:
- Converting
"properties_time to time_readable"
- Converting
"properties_updated to update_readable"
This transformation makes temporal analysis more intuitive and accessible.
3. Location Data Processing
I extracted and standardized location information from the 'properties_place' field:
- Separated city and state information
- Created a new
"'country'"
column using a comprehensive case_when() logic - Handled special cases and regions (e.g., "Japan region", "Fiji islands")
- Assigned
""International Waters""
for"oceanic"
regions
4. Missing Data Treatment
I implemented mean imputation for several numeric columns to handle missing values:
"- properties_nst (number of seismic stations)"
"- properties_dmin (minimum distance to stations)"
"- properties_gap (azimuthal gap)"
Each imputation was rounded appropriately to maintain data precision.
5. Temporal Analysis Enhancement
Created a "'hours_old'"
column to calculate the age of each earthquake event relative to the current time, facilitating recency analysis.
6. Geographic Classification
Added a 'continent' column based on longitude and latitude coordinates, using precise geographic boundaries to categorize each event into one of seven continents:
"- North America"
"- South America"
"- Europe"
"- Asia"
"- Africa"
"- Oceania"
"- Antarctica"
This comprehensive cleaning process transformed the raw USGS data into a structured, analysis-ready dataset, setting the stage for meaningful statistical analysis and visualization of global seismic patterns.
library(dplyr)
library(tidyr)
library(lubridate)
# Clean and transform data
earthquakes_clean <- earthquakes %>%
# Categorize earthquakes
mutate(
magnitude_category = case_when(
properties_mag < 2 ~ "Minor",
properties_mag < 4 ~ "Light",
properties_mag < 6 ~ "Moderate",
TRUE ~ "Major"
),
depth_category = case_when(
`Event Depth (km)` < 70 ~ "Shallow",
`Event Depth (km)` < 300 ~ "Intermediate",
TRUE ~ "Deep"
)
) %>%
# Handle missing values
mutate(across(where(is.numeric), ~ifelse(is.na(.), median(., na.rm = TRUE), .)))
earthquakes_clean %>%
select(magnitude_category, depth_category, update_readable) %>%
slice(1:10)
Data Categorization and Classification: Earthquake Analysis
In this phase of my analysis, I focused on categorizing earthquake events based on their characteristics and ensuring data completeness. Here's a detailed explanation of my approach:
Magnitude Classification
I created a 'magnitude_category' column that categorizes earthquakes into four distinct levels:
- Minor: Less than
"2.0"
on the Richter scale - Light: Between
"2.0"
and"3.9"
- Moderate: Between
"4.0"
and"5.9"
- Major:
"6.0"
and"above"
This classification aligns with standard seismological categories and helps in understanding the distribution of earthquake intensities.
Depth Classification
I established a 'depth_category' column based on standard seismological depth classifications:
- Shallow: Less than
"70 km deep"
- Intermediate: Between
"70 and 300 km"
- Deep: Greater than
"300 km"
This categorization is crucial for understanding the nature of seismic events, as depth often correlates with earthquake behavior and potential impact.
Data Completeness
To ensure comprehensive analysis capabilities, I handled missing values in numeric columns using median imputation. This approach:
- Maintains data distribution characteristics
- Provides a more robust alternative to mean imputation
- Ensures all numeric fields are complete for analysis
# Load necessary libraries
library(dplyr)
# Summary statistics
summary_stats <- earthquakes_clean %>%
group_by(magnitude_category) %>%
summarize(
count = n(),
avg_depth = mean(`Event Depth (km)`, na.rm = TRUE),
median_depth = median(`Event Depth (km)`, na.rm = TRUE),
sd_depth = sd(`Event Depth (km)`, na.rm = TRUE)
)
# Correlation analysis
correlation_matrix <- cor(select(earthquakes_clean,
properties_mag,
`Event Depth (km)`,
properties_sig),
use = "complete.obs")
# Hypothesis testing
# Test if depth differs by magnitude category
depth_test <- aov(`Event Depth (km)` ~ magnitude_category, data = earthquakes_clean)
summary(depth_test)
Statistical Analysis Results: Understanding Earthquake Patterns
Analysis of Variance (ANOVA) Interpretation
The ANOVA test results examining the relationship between earthquake depth and magnitude categories revealed significant findings:
Statistical Values:
-
Degrees of Freedom (Df):
- Magnitude Category: 3 (representing four categories:
"Minor"
,"Light"
,"Moderate"
,"Major"
) - Residuals:
"15,641"
(number of observations minus categories)
- Magnitude Category: 3 (representing four categories:
-
Sum of Squares:
- Between groups (Sum Sq for magnitude_category):
"7,963,588"
- Within groups (Residuals Sum Sq):
"37,284,463"
- Between groups (Sum Sq for magnitude_category):
-
Mean Square:
- Between groups:
"2,654,529"
- Within groups:
"2,384"
- Between groups:
-
F-value:
"(1114)"
- This large F-value indicates substantial differences between groups
Statistical Significance:
The extremely small p-value (< 2e-16, indicated by '***') provides strong evidence that:
- There are highly significant differences in earthquake depths across magnitude categories
- The relationship between magnitude and depth is not random
- The variation between categories is substantially larger than would be expected by chance
Practical Interpretation:
-
The high F-value
"(1114)"
indicates that:- The magnitude categories explain a significant portion of the variation in earthquake depths
- There are clear, distinct patterns in how depth relates to magnitude
-
The extremely low p-value suggests:
- We can be very confident that these differences are real
- The relationship between magnitude and depth is systematic
- Different magnitude earthquakes tend to occur at different depths
Implications:
- This analysis strongly supports the existence of a relationship between earthquake magnitude and depth
- The findings can be valuable for:
- Understanding earthquake patterns
- Improving prediction models
- Assessing seismic risk
- Developing targeted monitoring strategies
# Auto install the viridis package since it's not Install on Notebook #
if (!requireNamespace("viridis", quietly = TRUE)) {
install.packages("viridis")
}
library(ggplot2)
library(viridis)
# First, create the cleaned dataset with depth categories
earthquakes_clean <- earthquakes %>%
mutate(
depth_category = case_when(
`Event Depth (km)` < 70 ~ "Shallow",
`Event Depth (km)` < 300 ~ "Intermediate",
TRUE ~ "Deep"
)
)
# Plot 1: Earthquake distribution map
ggplot(earthquakes_clean,
aes(x = `Event Longitude (Deg)`,
y = `Event Latitude (Deg)`)) +
geom_point(aes(color = properties_mag,
size = properties_mag,
alpha = 0.6)) +
scale_color_viridis() +
theme_minimal() +
labs(title = "Global Earthquake Distribution",
subtitle = "Size and color indicate magnitude",
x = "Longitude",
y = "Latitude") +
theme(legend.position = "bottom")
# Plot 2: Magnitude distribution by depth
ggplot(earthquakes_clean,
aes(x = properties_mag,
fill = depth_category)) +
geom_density(alpha = 0.5) +
facet_wrap(~depth_category) +
theme_minimal() +
labs(title = "Earthquake Magnitude Distribution by Depth",
x = "Magnitude",
y = "Density")
Visual Analysis: Global Earthquake Patterns and Distribution
I've created two key visualizations that reveal important patterns in the global seismic activity data:
Global Geographic Distribution (Image 1)
This scatter plot visualizes the spatial distribution of earthquakes:
Geographic Patterns
- Clear concentration along tectonic plate boundaries
- Notable clusters in specific regions:
- Pacific Ring of Fire
"(high density around longitude 100 to 180)"
- Mid-Atlantic Ridge
"(around longitude -30)"
- Mediterranean region
"(around longitude 0-30)"
- Pacific Ring of Fire
Magnitude Representation
- Point size and color intensity correspond to earthquake magnitude
- Darker blue/larger points indicate higher magnitude events
- Lighter green/smaller points show lower magnitude events
Spatial Coverage
- Most activity between latitudes
"-60 and 80 degrees"
- Clear patterns following major fault lines
- Varying density of events across different regions
Magnitude Distribution by Depth Categories (Image 2)
This density plot reveals distinct patterns across three depth categories:
Deep Earthquakes
- Shows a sharp, concentrated peak around magnitude
"4.5-5.0"
- Very narrow distribution indicating consistent magnitude patterns
- Few events at lower or higher magnitudes
Intermediate Earthquakes
- Displays a bimodal distribution with peaks around:
- First peak at magnitude
"2.5"
- Second peak at magnitude
"4.0"
- First peak at magnitude
- Broader spread suggesting more variable magnitude range
Shallow Earthquakes
- Shows a more normal distribution centered around magnitude
"1.5-2.0"
- Has a longer right tail indicating occasional higher magnitude events
- Highest variety in magnitude range
# Bootstrap analysis of mean magnitude
set.seed(123)
bootstrap_samples <- replicate(1000, {
sample_data <- earthquakes_clean %>%
sample_n(size = nrow(earthquakes_clean), replace = TRUE)
mean(sample_data$properties_mag)
})
# Calculate confidence interval
ci <- quantile(bootstrap_samples, c(0.025, 0.975))
# Visualize bootstrap distribution
ggplot(data.frame(magnitude = bootstrap_samples), aes(x = magnitude)) +
geom_histogram(bins = 30, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = ci, linetype = "dashed", color = "red") +
theme_minimal() +
labs(title = "Bootstrap Distribution of Mean Magnitude",
subtitle = "Red lines indicate 95% confidence interval",
x = "Mean Magnitude",
y = "Count")
Bootstrap Analysis: Estimating Mean Earthquake Magnitude
Visualization Analysis: Bootstrap Distribution
I've analyzed the bootstrap analysis histogram shown, which used "1,000"
resamples to estimate the true mean magnitude of earthquakes in the dataset. The visualization reveals several key insights about the magnitude measurements.
Distribution Characteristics
- Center: The distribution appears to be centered around
"1.64"
magnitude, with the highest frequency of bootstrap samples occurring at this point - Shape: The distribution shows a clear
"bell-shaped (normal) pattern"
- Spread: The data shows a relatively narrow spread, indicating good precision in the measurements
- Symmetry: While generally symmetric, there's a slight right skew visible in the tail
Confidence Interval Analysis
The red dashed lines represent the "95%"
confidence interval:
- Lower bound: Approximately
"1.62"
- Upper bound: Approximately
"1.66"
- This interval suggests we can be
"95%"
confident that the true population mean magnitude falls within this range
Statistical Implications
- Precision: The narrow width of the confidence interval (approximately
"0.04"
units) indicates high precision in our estimate of the mean magnitude - Reliability: The normal shape suggests the sampling distribution follows expected theoretical properties
- Consistency: The smooth, well-formed distribution suggests a large underlying sample size
- Statistical Power: The narrow confidence interval and large sample size provide strong statistical inference capabilities
Practical Significance
- This analysis provides a reliable framework for:
- Understanding the typical magnitude of earthquakes in the region
- Making statistical inferences about seismic activity
- Providing a baseline for detecting unusual seismic events
- Supporting evidence-based disaster preparedness planning
# Temporal patterns
earthquakes_clean <- earthquakes_clean %>%
mutate(hour_of_day = lubridate::hour(time_readable))
time_analysis <- earthquakes_clean %>%
group_by(hour_of_day) %>%
summarize(
avg_magnitude = mean(properties_mag, na.rm = TRUE),
count = n()
)
# Visualize temporal patterns
ggplot(time_analysis, aes(x = hour_of_day, y = count)) +
geom_line() +
geom_point() +
theme_minimal() +
labs(title = "Earthquake Frequency by Hour of Day",
x = "Hour",
y = "Number of Earthquakes")
Temporal Analysis: Earthquake Frequency Distribution Across Daily Hours
Analysis of Hourly Earthquake Patterns
I've analyzed the temporal distribution of earthquakes throughout the day, revealing several interesting patterns in seismic activity:
Key Features of the Time Series
- Peak Activity
- Highest frequency observed around
"hour 9 (9:00 AM UTC)""
- Notable spike of approximately
"615"
earthquakes - Significantly higher than the average hourly frequency
- Secondary Peaks
- Distinct peaks observed at:
- Hour 0 (midnight UTC):
"~540 earthquakes"
- Hour 5:
"~540 earthquakes"
- Hour 20:
"~545 earthquakes"
- Hour 0 (midnight UTC):
- Low Activity Periods
- Minimum activity around:
- Hour 3:
"~460 earthquakes"
- Hour 17:
"~460 earthquakes"
- Hour 3:
- Clear troughs in seismic activity during these times
Pattern Analysis
- Cyclical Patterns
- Evidence of regular fluctuations throughout the day
- Approximately
"4-6 hour"
cycles between peaks and troughs - Suggests potential relationship with Earth's rotational effects
- Distribution Characteristics
- Range of approximately
"150"
earthquakes between highest and lowest frequencies - Generally maintained baseline of
"480-520"
earthquakes per hour - Irregular but noticeable rhythm in activity levels
Implications
- Monitoring Considerations
- Heightened monitoring may be beneficial during peak activity hours
- Resource allocation could be optimized based on these patterns
- Research Applications
- Pattern could inform studies of Earth's crustal dynamics
- Potential correlation with global human activity patterns
- Basis for further investigation of temporal triggers