Switch Customer Service Calls Toxicity Analyzer

AI-Powered Content Moderation System for Phone Calls

Project Goal

Analyze thousands of recorded customer service phone calls stored in Google Cloud Storage to automatically flag calls containing abusive tone, language, or conversations for human review.

Technology Stack

Storage: Google Cloud Storage (GCS)
Speech-to-Text: OpenAI Whisper
Content Moderation: Unitary toxic-bert (HuggingFace)
Environment: DataLab Notebook

Dataset Information

Bucket: dialpad-call-recordings_2023-2025
Location: call-recordings/call_center/Customer Service Reps/
Format: MP3 audio files
Organization: Organized by date folders (YYYY-MM-DD)
Volume: Thousands of recordings

Processing Pipeline Overview

Data Flow Architecture

MP3 Audio File (from GCS)
    ↓
[Cell 5: Download from GCS]
    ↓
Local Temporary File
    ↓
[Cell 6: Whisper Transcription]
    ↓
Text Transcript
    ↓
[Cell 7: Toxicity Analysis]
    ↓
Toxicity Scores (0-1 for each category)
    ↓
[Cell 8: Flagging Logic]
    ↓
Results Dictionary
    ↓
[Cell 9: Main Processing Function]
    ↓
Flagged Calls Report

Cell Reference Guide

Cell	Purpose	Key Components
Cell 1	Install Dependencies	Whisper, GCS, Transformers, PyTorch
Cell 2	Import Libraries	Core Python imports
Cell 3	Authentication	GCS client setup with service account
Cell 4	Load AI Models	Whisper (base) + toxic-bert classifier
Cell 5	Download Function	Retrieve MP3 from GCS to temp file
Cell 6	Transcription Function	Audio → Text conversion
Cell 7	Toxicity Analysis	Text → Toxicity scores
Cell 8	Main Pipeline	Orchestrates full processing flow
Cell 9	File Listing	Browse GCS bucket contents
Cell 10	Single File Test	Process first sample call
Cell 11+	Batch Processing	Scale to multiple files

Processing Steps (Per Call)

Download: Fetch MP3 file from GCS bucket to temporary storage
Transcribe: Convert audio to text using Whisper model
Analyze: Run toxicity detection on transcript
Score: Generate probability scores for each toxicity category
Flag: Mark calls exceeding threshold for human review
Store: Save results with metadata
Cleanup: Remove temporary files

AI Models Explained

1. Whisper Speech-to-Text Model

Overview

Developer: OpenAI
Purpose: Convert audio recordings to text transcripts
Model Size: Base (~150MB, 74M parameters)
Training Data: 680,000 hours of multilingual audio

How It Works

Analyzes audio waveforms in small chunks
Uses deep learning to map sound patterns to words
Handles background noise, accents, and multiple speakers
Outputs complete text transcript with timestamps

Model Size Options

Size	Parameters	Speed	Accuracy	Use Case
tiny	39M	Fastest	Lower	Quick testing
base	74M	Fast	Good	POC (Current)
small	244M	Medium	Better	Production
medium	769M	Slow	High	High accuracy needs
large	1550M	Slowest	Best	Maximum accuracy

Why Whisper?

✅ Industry-leading accuracy
✅ Robust to phone call audio quality
✅ Handles multiple accents
✅ Open-source and free
✅ Works offline (no API costs)

2. Toxic-BERT Content Moderation Model

Overview

Developer: Unitary AI (HuggingFace)
Purpose: Detect toxic, abusive, or harmful language
Model Size: ~400MB, 110M parameters
Training Data: Millions of labeled online comments

Toxicity Categories Detected

Category	Description	Example
toxic	General harmful/offensive language	"You're an idiot"
severe_toxic	Extremely offensive content	Highly aggressive profanity
obscene	Profanity and explicit language	Contains curse words
threat	Threatening language	"I'll make you pay"
insult	Direct personal attacks	"You're worthless"
identity_hate	Attacks based on identity	Discriminatory language

How It Works

Takes transcript text as input
Converts text to numerical embeddings (vectors)
Processes through 12 transformer layers
Outputs probability score (0.0 to 1.0) for each category
Example output:

   toxic: 0.85          ← 85% confidence
   severe_toxic: 0.12   ← 12% confidence
   insult: 0.72         ← 72% confidence
   threat: 0.08         ← 8% confidence

Flagging Threshold

Default: 0.7 (70% confidence)
Adjustable: Can be increased/decreased based on results
Logic: If ANY category > threshold → Flag for review

Why toxic-BERT?

✅ Multi-label classification (detects multiple types)
✅ High accuracy on abusive content
✅ Widely tested and validated
✅ Open-source and free
✅ Fast inference (real-time capable)

Technical Details

Model Architecture

Both models use Transformer architecture:

Self-attention mechanisms
Parallel processing of sequences
Context-aware predictions
Pre-trained on massive datasets

Memory Requirements

Whisper (base): ~300MB RAM
toxic-bert: ~800MB RAM
Total: ~1.1GB RAM minimum

Processing Speed (Estimates)

Transcription: ~30 seconds for 5-minute call
Toxicity Analysis: ~1 second per transcript
Total: ~30-45 seconds per call

Why These Sizes?

Neural networks store millions of learned "weights":

Each weight = 4 bytes (32-bit float)
toxic-bert: 110M parameters × 4 bytes = 440MB
Plus model metadata, vocabularies, configs

Expected Outputs & Results

Single Call Processing Output

When you run Cell 11 (process single call), you'll see:

============================================================
Processing: call-recordings/.../Allan_Mwenda-460572832305971.mp3
============================================================

Downloaded: call-recordings/.../Allan_Mwenda-460572832305971.mp3
Transcribing audio...
Transcript:
Hello, this is customer service. How can I help you today? 
I'm calling about my recent order...

Analyzing toxicity...

Toxicity Scores:
  toxic: 0.152 
  severe_toxic: 0.031 
  obscene: 0.087 
  threat: 0.012 
  insult: 0.098 
  identity_hate: 0.008 

✅ PASSED

Flagged Call Example

============================================================
Processing: call-recordings/.../problem_call.mp3
============================================================

Downloaded: call-recordings/.../problem_call.mp3
Transcribing audio...
Transcript:
You people are completely useless! This is ridiculous...

Analyzing toxicity...

Toxicity Scores:
  toxic: 0.892 ⚠️ FLAGGED
  severe_toxic: 0.234 
  obscene: 0.567 
  threat: 0.089 
  insult: 0.823 ⚠️ FLAGGED
  identity_hate: 0.045 

🚩 FLAGGED FOR REVIEW

Result Data Structure

Each processed call returns a dictionary:

{
    'file_path': 'call-recordings/.../file.mp3',
    'transcript': 'Full text of the conversation...',
    'toxicity_scores': {
        'toxic': 0.85,
        'severe_toxic': 0.12,
        'obscene': 0.67,
        'threat': 0.08,
        'insult': 0.72,
        'identity_hate': 0.05
    },
    'max_toxicity_score': 0.85,
    'flagged': True,
    'processed_at': '2026-02-04T03:45:12.123456'
}

Batch Processing Output (Future)

After Phase 2 implementation, you'll get:

Summary Statistics

Total Calls Processed: 1,247
Flagged for Review: 89 (7.1%)
Processing Time: 18 hours 32 minutes
Average per Call: 53 seconds

Top Toxicity Categories:
  1. insult: 42 calls
  2. toxic: 38 calls
  3. obscene: 21 calls

CSV Export

file_path,max_score,flagged,toxic,insult,threat,processed_at
call1.mp3,0.85,True,0.85,0.72,0.08,2026-02-04T03:45:12
call2.mp3,0.23,False,0.23,0.12,0.01,2026-02-04T03:46:05

Next Steps

✅ Phase 1 Complete: Single file processing working
⏳ Phase 2: Batch process all files
⏳ Phase 3: Generate review dashboard

Call Volume CalculationTotal storage: 15.51 GB = 15,510 MB

Average call size: 200 KB = 0.2 MB

Estimated total calls: 15,510 MB ÷ 0.2 MB = ~77,550 calls That's a substantial dataset! At 30-60 seconds processing time per call:

Best case: 77,550 × 30 sec = 646 hours = 27 days of continuous processing Realistic: 77,550 × 45 sec = 969 hours = 40 days of continuous processing

# Recommended Data Architecture
## Option A: CSV Files (simplest for a POC)
gs://dialpad-call-recordings_2023-2025/analysis-results/
├── processed_calls.csv          # Main results manifest
├── flagged_calls.csv            # Filtered flagged calls only
└── transcripts/                 # Full transcripts as text files
    ├── 2024-12-30_call_001.txt
    ├── 2024-12-30_call_002.txt
    └── ...
### CSV Schema (processed_calls.csv)
    call_id,file_path,call_date,agent_name,duration_sec,processed_at,model_version,whisper_model,toxicity_model,threshold,max_toxicity_score,flagged,toxic_score,severe_toxic_score,obscene_score,insult_score,threat_score,identity_hate_score,transcript_path,processing_time_sec

## Option B: BigQuery (best for scale)
CREATE TABLE `dca-apps.call_analysis.processed_calls` (
  call_id STRING,
  file_path STRING,
  call_date DATE,
  agent_name STRING,
  duration_seconds FLOAT64,
  
  -- Processing metadata
  processed_at TIMESTAMP,
  processing_time_seconds FLOAT64,
  model_version STRING,
  whisper_model STRING,
  toxicity_model STRING,
  threshold_used FLOAT64,
  
  -- Results
  transcript TEXT,
  max_toxicity_score FLOAT64,
  flagged BOOLEAN,
  
  -- Individual toxicity scores
  toxic_score FLOAT64,
  severe_toxic_score FLOAT64,
  obscene_score FLOAT64,
  insult_score FLOAT64,
  threat_score FLOAT64,
  identity_hate_score FLOAT64,
  
  -- Audit trail
  reprocessed BOOLEAN,
  notes STRING
);
```

**Advantages**:
- ✅ SQL queries for analysis
- ✅ Easy filtering and aggregation
- ✅ Can handle 77K+ records easily
- ✅ Built-in visualization with Looker Studio

### **Option C: Hybrid (Recommended)**

1. **BigQuery**: Structured data (scores, metadata, flags)
2. **GCS**: Full transcripts as separate text files
3. **CSV**: Backup/export for sharing

---

## 🖥️ Where Computing Happens

### **Current Setup**:
```
Your Local Machine: 0% compute (just displaying results)
         ↓
DataLab Notebook (Google Cloud VM): 100% compute
         ├── CPU: Running Whisper + toxic-bert
         ├── RAM: Loading models + processing audio
         └── Disk: Temporary file storage
         ↓
Google Cloud Storage: Storage only (no compute)

Complete Data Model Schema

Data Model for Each Processed Call

{ # Identification "call_id": "unique_identifier", # Generated from filename + timestamp "file_path": "gs://bucket/path/to/file.mp3", "call_date": "2024-12-30", "agent_name": "Allan Mwenda", "customer_id": "4605728323059712", # Extracted from filename "session_id": "6634603563499520", # Extracted from filename

# Audio metadata
"file_size_kb": 152.1,
"duration_seconds": 180.5,
"audio_format": "mp3",

# Processing metadata
"processed_at": "2026-02-04T04:15:32.123456Z",
"processing_time_seconds": 45.2,
"model_version": "v1.0",
"whisper_model": "base",
"toxicity_model": "unitary/toxic-bert",
"threshold_used": 0.7,

# Results
"transcript": "Full text of conversation...",
"transcript_length": 450,  # character count
"max_toxicity_score": 0.001,
"flagged": False,

# Detailed toxicity scores
"toxicity_scores": {
    "toxic": 0.001,
    "severe_toxic": 0.000,
    "obscene": 0.000,
    "insult": 0.000,
    "threat": 0.000,
    "identity_hate": 0.000
},

# Quality assurance
"reprocessed": False,
"human_reviewed": False,
"human_reviewer": None,
"review_date": None,
"review_notes": None,
"ground_truth_label": None,  # For testing threshold

# Error handling
"processing_error": None,
"error_message": None

}

# Cell 1: Install Required Libraries
!pip install openai-whisper
!pip install google-cloud-storage
!pip install transformers
!pip install torch
!pip install pydub

# Cell 2: Import Libraries
from google.cloud import storage
import whisper
from transformers import pipeline
import pandas as pd
from datetime import datetime
import os
import tempfile

# Cell 3: Initialize with Service Account Key

from google.cloud import storage
import os

# Your JSON key filename (with spaces)
key_filename = "DCA Apps Storage Access.json"

# Full path to the key
key_path = f"/work/files/workspace/{key_filename}"

print(f"Looking for key file at: {key_path}")

# Check if file exists
if os.path.exists(key_path):
    print("✅ Key file found!")
else:
    print("❌ Key file not found at expected path")

# Set credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = key_path

# Initialize client
client = storage.Client(project='dca-apps')

bucket_name = "dialpad-call-recordings_2023-2025"
bucket = client.bucket(bucket_name)

print(f"\n✅ Attempting to connect to bucket: {bucket_name}")

try:
    bucket.reload()
    print(f"✅ Bucket access confirmed!")
    
    # List a sample file to verify full access
    blobs = list(bucket.list_blobs(max_results=1))
    if blobs:
        print(f"✅ Sample file accessible: {blobs[0].name}")
        print("\n🎉 Authentication successful! Ready to proceed to Cell 4.")
    else:
        print("⚠️ Bucket appears empty or no files found")
except Exception as e:
    print(f"❌ Error accessing bucket: {e}")

# Cell 4: Initialize Google Cloud Storage Client (DataLab version)

from google.cloud import storage
import google.auth

# Get default credentials (should work automatically in DataLab)
credentials, project_id = google.auth.default()

print(f"Using project: {project_id}")
print(f"Credentials found: {credentials is not None}")

# Initialize client
client = storage.Client(project="dca-apps", credentials=credentials)

bucket_name = "dialpad-call-recordings_2023-2025"
bucket = client.bucket(bucket_name)

print(f"✅ Connected to bucket: {bucket_name}")

# Quick test - try to access the bucket
try:
    bucket.reload()
    print(f"✅ Bucket access confirmed!")
except Exception as e:
    print(f"❌ Error accessing bucket: {e}")

# Cell 5: Load Models (This will take a few minutes the first time)
print("Loading Whisper model for transcription...")
whisper_model = whisper.load_model("base")  # Options: tiny, base, small, medium, large

print("Loading toxicity detection model...")
# Using a HuggingFace toxicity classifier
toxicity_classifier = pipeline(
    "text-classification", 
    model="unitary/toxic-bert",
    top_k=None
)

print("Models loaded successfully!")

# Cell 6: Function to Download Single File from GCS
def download_audio_from_gcs(blob_path):
    """
    Download an audio file from GCS to a temporary location
    """
    blob = bucket.blob(blob_path)
    
    # Create temporary file
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp3')
    temp_file_path = temp_file.name
    temp_file.close()
    
    # Download
    blob.download_to_filename(temp_file_path)
    print(f"Downloaded: {blob_path}")
    
    return temp_file_path

# Cell 6: Function to Transcribe Audio
def transcribe_audio(audio_path):
    """
    Transcribe audio file using Whisper
    """
    print("Transcribing audio...")
    result = whisper_model.transcribe(audio_path)
    return result['text']

# Cell 7: Function to Analyze Toxicity
def analyze_toxicity(text):
    """
    Analyze text for toxic content
    Returns toxicity scores
    """
    print("Analyzing toxicity...")
    results = toxicity_classifier(text[:512])  # Limit to 512 tokens
    
    # Parse results
    toxicity_scores = {}
    for result_set in results[0]:
        toxicity_scores[result_set['label']] = result_set['score']
    
    return toxicity_scores

‌
‌
‌

Switch Customer Service Calls Toxicity Analyzer

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}AI-Powered Content Moderation System for Phone Calls

Project Goal

Technology Stack

Dataset Information

Processing Pipeline Overview

Data Flow Architecture

Cell Reference Guide

Processing Steps (Per Call)

AI Models Explained

1. Whisper Speech-to-Text Model

Overview

How It Works

Model Size Options

Why Whisper?

2. Toxic-BERT Content Moderation Model

Overview

Toxicity Categories Detected

How It Works

Flagging Threshold

Why toxic-BERT?

Technical Details

Model Architecture

Memory Requirements

Processing Speed (Estimates)

Why These Sizes?

Expected Outputs & Results

Single Call Processing Output

Flagged Call Example

Result Data Structure

Batch Processing Output (Future)

Summary Statistics

CSV Export

Next Steps

Call Volume CalculationTotal storage: 15.51 GB = 15,510 MB

Average call size: 200 KB = 0.2 MB

Complete Data Model Schema

Data Model for Each Processed Call

AI-Powered Content Moderation System for Phone Calls