HomeUpcoming webinars

Evals for Agents with Arize

Key Takeaways:
  • Learn key principles for observing and evaluating AI agents.
  • Get hands-on with the Arize platform for agent testing and monitoring.
  • Build, evaluate, and analyze a simple AI agent end to end.
Tuesday, March 10, 11 AM ET
View More Webinars

Register for the webinar

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Description

As AI agents become more autonomous, testing and debugging their behavior becomes both more important—and more challenging. Traditional metrics often fall short when agents reason, plan, and act across multiple steps. To build reliable agentic systems, teams need strong observability and evaluation practices baked in from day one.

In this code-along webinar, Laurie Voss, Head of Developer Relations at Arize, will show you how to automatically test and debug AI agents using the Arize AI engineering platform. You’ll learn core principles of agent evaluation, then build and instrument a simple agent to track performance, behavior, and failure modes. By the end of the session, you’ll have a practical framework for monitoring agents in development and beyond.

Set-Up Instructions

The notebook requires 3 secrets, which you can enter as colab secrets, or just paste directly into cell 2:

  • os.environ["ANTHROPIC_API_KEY"]
  • os.environ["PHOENIX_API_KEY"]
  • os.environ["PHOENIX_COLLECTOR_ENDPOINT"]

For Anthropic, you can get a free API key from https://platform.claude.com/settings/keys

Next you'll need access to Phoenix Cloud! This is free to use.(You could also run Phoenix locally on your laptop but this is easier!)

  • Go to https://app.phoenix.arize.com and sign inIf you see a screen with a "launch space" button, launch your space.
  • You should then choose "Settings" in the bottom left. You'll see ascreen as shown.
  • For the API key, click the "+ System Key" button where the big redarrow is pointing.
  • For the collector endpoint, you want your hostname, shown by thesmaller red arrow.

Presenter Bio

Laurie Voss Headshot
Laurie VossHead of Developer Relations at Arize

Laurie is web developer turned startup executive turned data and AI evangelist. With over 30 years of experience in tech, he runs the developer relations team at Arize, teaching people how to evaluate AI applications. Previously, Laurie was VP of Developer Relations at LlamaIndex, and was the founding CTO of npm, taking it from a hobby project to 5M active users. He also served as COO and CDO at npm.

View More Webinars