Evals for Agents with Arize

Laurie Voss shows you how to automatically test and debug AI agents using the Arize AI engineering platform.

Mar 11, 2026

As AI agents become more autonomous, testing and debugging their behavior becomes both more important—and more challenging. Traditional metrics often fall short when agents reason, plan, and act across multiple steps. To build reliable agentic systems, teams need strong observability and evaluation practices baked in from day one.

In this code-along webinar, Laurie Voss, Head of Developer Relations at Arize, will show you how to automatically test and debug AI agents using the Arize AI engineering platform. You’ll learn core principles of agent evaluation, then build and instrument a simple agent to track performance, behavior, and failure modes. By the end of the session, you’ll have a practical framework for monitoring agents in development and beyond.

Key Takeaways:

Learn key principles for observing and evaluating AI agents.
Get hands-on with the Arize platform for agent testing and monitoring.
Build, evaluate, and analyze a simple AI agent end to end.

Set-Up Info, Notebook, Solution, Resources

Topics

AI Agents

Generative AI

Python

Artificial Intelligence

blog

Mistral Vibe 2.0: The Terminal-Based AI Coding Agent

Test whether custom subagents and slash commands actually reduce the chaos of legacy code maintenance. Find out if on-premises deployment is worth abandoning your IDE assistant.

Oluseye Jeremiah

8 min

podcast

A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndex

Richie and Jerry explore the readiness of AI agents for enterprise use, the challenges developers face building agents, document processing and data structuring, the evolving landscape of AI agent frameworks like LlamaIndex, and much more.

Tutorial

OpenAI AgentKit Tutorial With Demo Project: Build an AI Agent

Learn how to use OpenAI AgentKit to automate GitHub issue reviews with AI agents that classify, detect duplicates, and summarize reports.

Aashi Dutt

Tutorial

Jan-V1: A Guide With Demo Project

Learn how to build a Deep Research Assistant using Jan-v1's agentic reasoning capabilities, including local deployment, Streamlit app development, and more.

Aashi Dutt

code-along

Building a Deep Research AI Multi-Agent with LlamaIndex

Laurie Voss, VP of Developer Relations at LlamaIndex, guides you through building a deep research AI multi-agent using LlamaIndex.

Laurie Voss

code-along

Creating AI Agents on AWS

Sowjanya Pandruju, Cloud Application Architect at AWS, teaches you how to develop and deploy AI agents using AWS.

Sowjanya Pandruju

See More See More