Skip to content
0

California Air Quality: From Data Challenges to Insights

A Data-Driven Look at California Air Quality Variations

California’s diverse geography and industrial landscape create complex air quality challenges. Through this analysis of 2024 ozone data fthe U.S. Environmental Protection Agency (EPA), sourced through both AirNow and AQS systems, we dive into where, when, and why air pollution worsens.

Image by freepik

Summary

As part of our mission to assess air quality and support environmental decision-making, our company was tasked with evaluating ozone pollution across different regions in California. We utilized daily ozone measurement data. We conducted a thorough data quality assessment, validated measurement consistency across different methods and monitoring sites, and analyzed spatial and temporal pollution patterns to identify high-risk areas.

Our analysis was structured around three key phases:

Data Cleaning:

  • We addressed key data quality issues to ensure reliable analysis fixed by handling missing values, resolving inconsistencies and filtering outliers

Data Validation

  • AQS data covered all California counties, while AirNow was available for all except Humboldt and Lake.

  • We validated AQI variation across method codes; notably, method Code 053 consistently reported higher ozone concentrations, indicating potential measurement bias if only one method is used in a county.

Insight Extraction

  1. Summer months showed significantly higher AQI values
  • Increasing of sunlight and temperature, which accelerate ozone formation.
  • Wildfires driven by extreme heat contributed substantially to poor air quality. California led the U.S. in 2024 with over 8,300 wildfires burning 1.08 million acres.
  1. Geography plays a critical role:
  • San Joaquin Valley counties had the worst summer AQI, influenced by topography that traps pollutants, and emissions from agriculture, traffic, and industry.

  • Coastal counties like Humboldt and San Francisco maintained better AQI year-round, aided by ocean breezes that help disperse pollutants.

  1. Human activity impacts are evident in weekday vs. weekend AQI trends.
  • Elevated NOx and VOCs—emitted from vehicles, industrial operations, and household sources—lead to higher ground-level ozone during weekdays.

🔍 Data Quality: Cleaning Before Meaning

Before diving into insights, we tackled significant data quality issues. It's a vital step to ensure accurate conclusions:

1. Missing Data

  • Method Code: 6,490 missing entries—all from AirNow. We flagged this as a known limitation.
  • CBSA Info: 2,408 records from non-metro areas lacked CBSA codes; we used a standard placeholder (99999).
  • AQI & Ozone: 2,783 missing values. Since AQI calculated based on Ozone concentration, this strong one-to-one relationship allowed us to confidently impute the missing values based on known pairs. We drop rows where both Daily AQI Value and MAX 8-hours ozone concentration are Null.

2. Inconsistent Formats

  • County Names: Variations like “LA, SF” vs. “Los Angeles, San Francisco” inflated the number of unique counties—standardization resolved this.
  • Partial Dates: 9,202 records like “/2024” defaulted to January 1st, which inflated January’s average AQI. These rows were excluded to avoid seasonal bias.

3. Outliers

  • Daily Observation Count havning values of 1000 (vs. typical 1–24) introduced skew. These anomalies were removed.

4. Duplicates

  • 267 rows removed to prevent overcounting.

52 hidden cells

🔍 How Ozone Shapes the Air We Breathe

Unveiling the Link Between Daily AQI and Ozone Levels

Ever wondered how clean — or polluted — the air really is? The Air Quality Index (AQI) offers a simple answer.

Table-1: Shows The AQI is divided into six categories, each associated with a specific level of health concern.

ValueAQI StatusDescription
0 to 50GoodAir quality is satisfactory, and air pollution poses little or no risk.
51 to 100Moderaterisk for some people, particularly those who are unusually sensitive to air pollution.
101 to 150Unhealthy for Sensitive GroupsMembers of sensitive groups may experience health effects. The general public is less likely to be affected.
151 to 200UnhealthySome members of the general public may experience health effects
201 to 300Very UnhealthyHealth alert: The risk of health effects is increased for everyone.
< 300HazardousHealth warning of emergency conditions: everyone is more likely to be affected

But behind that number lies a key driver: Ozone Concentration.

The following scatter plot reveals a clear, strong correlation between the daily AQI value and the maximum 8-hour ozone concentration (Part Per Million), highlighting how this pollutant plays a pivotal role in determining air quality levels across EPA's defined health bands.

Hidden code
  • The relationship appears mostly linear but with subtle curve patterns at higher maximum 8-hour ozone concentration.

  • The data transitions smoothly across AQI bands, validating the reliability of ozone concentration as a basis for AQI categorization.

We will use AQI value at the reset of report as it easy to interprete.


🔍 Data Validation: Potential Bias in Methodology and Sampling

Source Validation: Potential Bias from Single-Method Sensors

Is your air quality data fair, or just flawed?

The first thing that comes to mind when analyzing data from different sources is to compare the differences in measurement methods used to collect ozone concentration. to ensures the readings are consistent, reliable, and not biased due to variations in instrumentation, calibration, or data reporting standards. Validating these differences is critical before drawing any conclusions or merging the datasets for further analysis.

Table-2: Ozone Consentration Measurment Methods

CodeMethod DescriptionTypeNotes
047UV PhotometricFEMWidely used, accurate
087Non-regulatory / Unknown O₃ MethodNon-FEMOften from AirNow; public estimates
199Other / Undefined MethodN/APlaceholder, unclassified
053UV Absorption (Gas Phase Chemiluminescence)FRMEPA-approved for compliance use