Accéder au contenu principal

Building Trust in AI Agents with Shane Murray, Senior Vice President of Digital Platform Analytics at Versant Media

Richie and Shane explore AI disasters and success stories, the concept of being AI-ready, essential roles and skills for AI projects, data quality's impact on AI, and much more.
3 nov. 2025

Shane Murray's photo
Guest
Shane Murray
LinkedIn

Shane Murray is a seasoned data and analytics executive with extensive experience leading digital transformation and data strategy across global media and technology organizations. He currently serves as Senior Vice President of Digital Platform Analytics at Versant Media, where he oversees the development and optimization of analytics capabilities that drive audience engagement and business growth. In addition to his corporate leadership role, he is a founding member of InvestInData, an angel investor collective of data leaders supporting early-stage startups advancing innovation in data and AI. Prior to joining Versant Media, Shane spent over three years at Monte Carlo, where he helped shape AI product strategy and customer success initiatives as Field CTO.


Richie Cotton's photo
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Key Quotes

I have an underlying philosophy that you can't truly establish readiness until you start building. And that is the best way to be ready. Working with teams who are building AI, the ones that are actually going through the prototype to production to scale up phase are the ones that are finding out what readiness really means for their organization.

Human in the loop is essential. I wouldn't be building a solution without having factored in some human in the loop. Both as a quality gate, but also to build that trust with users, who are the ultimate consumers of your solution, who have a say in whether it is reliable.

Key Takeaways

1

Adopt a phased rollout approach for AI applications, starting with small user groups and gradually scaling, to manage quality and build trust incrementally.

2

Implement human-in-the-loop processes to maintain quality and trust in AI applications, while planning for scalable automated evaluation methods.

3

Focus on context engineering to provide reliable and trusted data inputs for AI agents, ensuring semantic consistency and relevance across data sources.

Links From The Show

Versant Media External Link

Transcript

Shane Murray

I kind of have an underlying philosophy.

Shane Murray

That you can't truly.

Shane Murray

Establish readiness until you start building, and that is the best way to be ready.

Shane Murray

Actually, from working with teams who are building I the ones that are.

Shane Murray

Actually going through the prototype to production to scale.

Shane Murray

Up phase, are the ones.

Shane Murray

That are finding out what readiness really means for their organization.

Shane Murray

Human in the loop.

Shane Murray

I think is essential.

Shane Murray

I actually probably wouldn't be building a solution.

Shane Murray

Without having factored in some human in the loop to.

Shane Murray

It. And I think both are.

Shane Murray

As like a.

Shane Murray

Quality gate, but also to build th... See more

at.

Shane Murray

Trust with users, the ultimate consumers of it, who have a say in it being reliable.

Richie Cotton

And welcome to date framed. This is Richie. For all the promise of AI agents, a major barrier to them being useful is that they don't always work. That is, they are untrustworthy. Today we're going to look at ways to improve trust in your AI agents with a big focus on data quality. I also want to discuss AI readiness.

Richie Cotton

Every CEO claims they want their company to be AI ready, but I'm not really sure what that means. In this episode, we have a repeat guest, Shane Murray. At the time of recording, he was Field Chief Technology Officer at the data and AI observability platform Monte Carlo. He's just switched jobs to be senior vice president of digital platform analytics at Versant Media.

Richie Cotton

In fact, he's returning to his media roots since he was previously senior vice president of digital analytics at The New York Times. So let's find out how to build trust in AI agents. Hi, Shane. Welcome to the show.

Shane Murray

Hi, Richie. Nice to see you.

Richie Cotton

Wonderful. Yeah. Great to have you here. Now, I love a good disaster story. So, just to begin with, can you tell me what's the biggest AI disaster you've seen?

Shane Murray

From my previous days as a data. Later, the New York Times, we talked about the disasters is those that might appear on the front page of the the New York Times. As many teams.

Shane Murray

I think, do I you know, I would say from the people I work with, people.

Shane Murray

Are generally fairly conservative. So I, I haven't been firsthand to disasters.

Shane Murray

But the ones that kind of come to mind,

Shane Murray

I don't know if you saw there was a.

Shane Murray

McDonald's chat bot that leaked.

Shane Murray

I think it was like million job applicant records. And the security researchers found.

Shane Murray

That the default password was.

Shane Murray

Shane Murray

Six. Comically so. I mean, I think these reputational ones are the ones you.

Shane Murray

You really worry about. And similarly, the one that comes to mind just from having been in the field of media, but I think it was the Chicago Sun-Times who had a summer reading list that was essentially all fictional books.

Shane Murray

Right? You know, not fiction books, but actually fictional books. And so so these ones that kind of get to the heart.

Shane Murray

Of reputation and trust, I think of the other ones you watch out.

Richie Cotton

For. Yeah, absolutely. It's all very well. Having, like a forecast is a little bit off, but when you're causing like, problems that million customers of yours, that's a really kind of it's a bad situation, something you definitely want to avoid. Maybe that's too negative. So we need the flip side to this. Told me through some success stories you've seen.

Shane Murray

Yeah, I, maybe reference, one I've.

Shane Murray

I've read about. And then one, two and one I've seen firsthand just working with customers here at Monte Carlo.

Shane Murray

But, the one that struck me, just.

Shane Murray

Like reading what data.

Shane Murray

Teams are doing.

Shane Murray

Is the stripe.

Shane Murray

Team. And I haven't actually seen like, a, a.

Shane Murray

Deeply published post on this, but one of their team members was talking about how they've.

Shane Murray

Essentially built.

Shane Murray

A foundational model.

Shane Murray

For payments using.

Shane Murray

Billions of transactions.

Shane Murray

Which is essentially replacing, you know, feature engineering. You know, what we call now traditional ML models with embeddings.

Shane Murray

Capturing the relationship between payments.

Shane Murray

And I just thought that was super impressive. And, you know, most people tend to talk about traditional AI, ML.

Shane Murray

And then.

Shane Murray

Generative AI. And the fact that that they're using these LM approaches to rethink these.

Shane Murray

These sort of foundational business.

Shane Murray

Problems.

Shane Murray

Around, you know, fraud detection and things like that, I just found super interesting.

Shane Murray

Yeah. Have you seen that one? I.

Richie Cotton

No, I hadn't heard about this does actually kind of wild because. Yeah. You think about machine learning and generative AI as being very separate things, but actually, yeah, an embedding model to it's nominees for like detecting how close different words are mean to each other. So I guess detecting how close payments are. That seems like a good way to detect fraud.

Richie Cotton

So yeah, very innovative there.

Shane Murray

Yeah. Yeah definitely that one. And then I just say like from from customers I work with where I've seen the most value, even though it's not maybe.

Shane Murray

The.

Shane Murray

Like the autonomous agent use case is these teams that are basically.

Shane Murray

Taking what, you know, what what kind of robotic process automation steps and, and.

Shane Murray

You know, unstructured to structured data.

Shane Murray

And so an example customer I work with pilot travel centers who have, you know, travel centers in the US. They're the biggest diesel fuel provider. They're the biggest subway franchisee in the US.

Shane Murray

And they actually took these bills of lighting that the truck.

Shane Murray

Drivers essentially take photographs of and send in. And then it goes into this what was a previously human driven processing loop.

Shane Murray

And used our.

Shane Murray

Lambs to extract the data from those images and then, dramatically speed up their kind of financial data pipelines, which I.

Shane Murray

Thought, I think that's become like a prototypical use case, but it, you know, super.

Shane Murray

Interesting one and and really valuable to organizations.

Richie Cotton

Yeah, certainly it hits on like a lot of the kind of current themes like the idea that, well, okay, images and now data, you can extract very useful, structured information from them. And then also like one of the big goals is like, okay, let's speed up some processes, automate stuff and then we can go faster. All right.

Richie Cotton

Nice hitting all the key themes in one story. I like that I want to talk about how we get to these, success stories. So I think, a lot of organizations, one of the big sort of management buzzwords has been about becoming AI ready. I'm still not quite sure what that means. So talk me through what is being I really mean to you?

Shane Murray

Yeah, I think it can.

Shane Murray

Be a very ambiguous term that that goes without.

Shane Murray

Definition often. And, and.

Shane Murray

Probably similar to terms of the past, these kind of terms about in, in data in business.

Shane Murray

But, you know, I think I kind.

Shane Murray

Of have an underlying.

Shane Murray

Philosophy that you can't truly establish.

Shane Murray

Readiness until you stop.

Shane Murray

Building.

Shane Murray

And that is the best way to be ready. I know that's kind of a cheat.

Shane Murray

And so but like actually from working with teams who are building AI, the ones that are.

Shane Murray

Actually going through the prototype to production to scale up.

Shane Murray

Phase are.

Shane Murray

The ones that are finding out, you know, what readiness really means for their organization.

Shane Murray

And that can be organizational. That can be.

Shane Murray

From a.

Shane Murray

Systems perspective, you.

Shane Murray

Know, of a.

Shane Murray

Cloud first. Do they have.

Shane Murray

The right.

Shane Murray

You know, management around.

Shane Murray

Unstructured data, that.

Shane Murray

Sort of thing. And then I'd say just like more, more practically.

Shane Murray

Where I see teams focused on in this space. One is do they have reliable I read data, I think everyone's kind of accepted this idea that for your AI to be good and ready, your data has to be ready. So that's kind of one dimension.

Shane Murray

The other is.

Shane Murray

This thing we talk about with customers, which.

Shane Murray

Is once you, you know, our building.

Shane Murray

I do you have the means to evaluate and monitor the quality of that. As you're scaling it out. You know, the quality of the outputs, the quality of the process to build it.

Shane Murray

And then thirdly is kind of, you know, when the agent maybe goes askew, do you have the.

Shane Murray

Ability to to understand it and to intervene. Right, and to prevent bad outcomes.

Richie Cotton

That's interesting because, those ideas are kind of at the different ends of the drum cycle. Like, I love the idea that in order to figure out, like, what you don't know, you just try building stuff and where if you fail, it's going to be useless. Information like this is what we need to learn about. This is what we need to do better.

Richie Cotton

But then also, yeah, like the, the big problems are like, well, okay, how do we actually scale up? How do we understand, like what's going on with these agents by putting out into the world? Cause if you can't track it and yeah, disasters are going to happen. Let's talk about, what skills you need, to have in order to, like, start building with this stuff you can talk me through, like, basically like what roles are going to be involved and what skills that need not to build stuff.

Shane Murray

Yeah. This is something I've been thinking about.

Shane Murray

Quite a bit.

Shane Murray

And talking to, to data.

Shane Murray

Leaders about in the space, because.

Shane Murray

It's I think there are a lot of data roles in teams these days.

Shane Murray

You know, I can I can probably list off at least or different roles that you might have in a large data organization.

Shane Murray

But I think I think I obviously is.

Shane Murray

Kind of changing how we need to think about this. And, and the role.

Shane Murray

Of machine learning.

Shane Murray

Teams, the role of data engineering teams.

Shane Murray

I think when I if I break.

Shane Murray

It down into what I see is.

Shane Murray

Necessary.

Shane Murray

One is this idea of of context engineering.

Shane Murray

And I think at the moment I'm seeing the people that do that often have the title of.

Shane Murray

AI.

Shane Murray

Engineer, but in many cases it could be.

Shane Murray

A data engineer or a software.

Shane Murray

Engineer. But it's really.

Shane Murray

The idea that you have to.

Shane Murray

Engineer a good.

Shane Murray

Context that could be a drag pipeline, could be a set of prompts, could be hitting different APIs that that are bring in the right information. How do you make sure that you have trusted and reliable context serving this agent?

Shane Murray

The second I've heard some.

Shane Murray

Debate around whether this is kind of a product role or a more of a data science role, but how do you define and evaluate quality right. And and often this is a mix of human feedback, right. How do you take in human signals about, you know, what's relevant, what's irrelevant.

Shane Murray

And then how do you use Lime as a judge? Techniques to also.

Shane Murray

Measure and monitor quality.

Shane Murray

And I think I kind of think at the moment.

Shane Murray

This is a data science job and where I see it being done best, it's being done by data.

Shane Murray

Scientists because it it is a lot of thought about how you measure the quality.

Shane Murray

Of, of an experiment or how you measure these non-deterministic.

Shane Murray

Outputs in a way that's, that.

Shane Murray

Is scientific.

Shane Murray

Right. And then the third kind of role I see often is more as you start.

Shane Murray

To scale up, which is.

Shane Murray

Kind of the the.

Shane Murray

Platform engineer. Right. How do you instrument your stack to to be able to support many agents.

Shane Murray

To.

Shane Murray

Build upon?

Shane Murray

And we're we're a bit early for that. But I'm certainly.

Shane Murray

Seeing teams think about, you know, how do you have extensibility and scale. How do you have standard frameworks for experimentation and for observability?

Richie Cotton

It's interesting stuff. So I'm that second point about, how is data science teams that need you about monitoring the quality of, the AI work? I think, that's fascinating because I get the sense over the last few years there's been a a drift from data teams being the people responsible for doing AI to engineering teams being the ones responsible for doing AI.

Richie Cotton

So it's like, nice to have the data seems to still have a purpose here.

Shane Murray

Yeah. I'm I'm often saying that it's, it's the data teams that are.

Shane Murray

Potentially staffing that talent.

Shane Murray

Within the product or software engineering team. And so.

Shane Murray

It's a data scientist.

Shane Murray

That, that might be doing some of the context.

Shane Murray

Engineering, but might also be responsible for reporting back, like how well is this thing doing. Right. And I, I still think there's like a, a huge role for, for data people to play in this.

Richie Cotton

Oh, since you mentioned, context engineering, this is like, this is Andrej Karpathy making up phrases again. Do you just want to explain like, what is context engineering?

Shane Murray

Yeah, I think this term I, I've, I'm not not as.

Shane Murray

Sure of the.

Shane Murray

Derivation of it, but I.

Shane Murray

Feel like it is the.

Shane Murray

Right term after people have started with prompt engineering and then realized it's more than prompts.

Shane Murray

Right? In some cases you're you're building a pipeline.

Shane Murray

That basically all the ways.

Shane Murray

That you're bringing in unstructured and structured data.

Shane Murray

As well as prompting.

Shane Murray

The AI to to actually make it work. So that, I mean, it's not a very.

Shane Murray

Scientific definition, but that that's kind of how I think of this.

Shane Murray

Like larger space of of context engineering.

Richie Cotton

Yeah. I to say the term has been quite divisive at datacamp. So all the people who are involved in engineering, like building AI features like, yeah, trying to figure out how much context to give to an AI. It's incredibly important task context engineering is a real thing growing important. But then also like, the curriculum teams are people are building courses and you want to marketing like, well, giving context to people when you're explaining stuff that's just what we do all day.

Richie Cotton

We don't need a new term for it. So yeah. Definitely divisive. Depends on on your point of view as to whether it's a useful thing or not. Oh, so you also mentioned idea of platform engineering. Just talk me through what what's that like?

Shane Murray

My experience with with data teams over the.

Shane Murray

Past decade is, like.

Shane Murray

We really gravitated to this world where we need people.

Shane Murray

Focused on building the platform that that other data engineering or other data analyst or data science teams can make use of. Right? And so so building up.

Shane Murray

Both the infrastructure and the sort of golden pathways or the tooling that you expect people to use. And so, you know, as I've, I've been.

Shane Murray

Talking to probably data teams over the past six months about how they're approaching this. And very often they're seeing fragmented.

Shane Murray

You know, agents or AI being built out.

Shane Murray

On the edge. And software engineering teams.

Shane Murray

Who are close to the the problem statement of the customer. But they're all going and building with different tools and different frameworks. And so the idea of the platform.

Shane Murray

Engineering team is really about standardization. And often I think when it's when it's done best, you're.

Shane Murray

You're waiting till there's.

Shane Murray

Enough use cases to require a platform.

Shane Murray

You shouldn't build.

Shane Murray

Ahead of of the need too much.

Shane Murray

And then you're also like providing the.

Shane Murray

Things that are naturally adopted.

Shane Murray

You're not a good platform. You don't have to force.

Shane Murray

Adoption because.

Shane Murray

People see the benefits of.

Shane Murray

Not having to own that. That foundational layer.

Richie Cotton

Okay, this seems incredible, is like having some sort of standardized, infrastructure. And I guess a lot of companies, particularly the larger the company, you end up with silos, with different teams using different, pieces of tech that do the same thing. So I guess maybe talk me through, I'm excited to start this conversation. We've becoming AI ready.

Richie Cotton

So what's the infrastructure you need to be? I really like what does a sensible tech stack look like?

Shane Murray

I mean, at the moment it it feels fairly minimal, right? I think people are looking at at orchestration.

Shane Murray

So something like a line graph is, is very common. Right. And establishing like a standard for a gateway to whatever set of, of foundational models you want to make available to people within the organization to build upon.

Shane Murray

I, I do think like.

Shane Murray

Platforms then extend into.

Shane Murray

How.

Shane Murray

Do I reliably run experiments, right? How do I evaluate that this thing is better than the last version? And how do I how do I have observability.

Shane Murray

And that observability.

Shane Murray

Can be the latency.

Shane Murray

Or the cost.

Shane Murray

Right? How many tokens you're using?

Shane Murray

I think.

Shane Murray

Some people have certainly run into cases with, with all this AI being built, that somehow they'll have a cost spike where a team hasn't realized, like.

Shane Murray

What they've done. But it can also be like.

Shane Murray

The the traceability of these agents and ensuring you're knowing how they're behaving.

Shane Murray

So I think we're at at the early stages of the platform discussion, you can also I.

Shane Murray

Would say like, you know, there's discussion about whether you have a dedicated vector database or like what is the underlying management of unstructured and unstructured data that supports the agents.

Shane Murray

So all these things can be components. But yeah, I would say it it feels like we're pretty.

Shane Murray

Early in the platform discussion of of AI.

Richie Cotton

Okay. Yeah. So I guess probably on a plan for swapping out some tools, at regular intervals in the near future, then, is there a way to make your infrastructure a bit more fluid then, like, how do you plan for that?

Shane Murray

How do you mean?

Richie Cotton

If you got some sort of fixed tech stack, then, you probably that's not going to work. You're going to want to change it up at some point.

Shane Murray

I don't think I have a great answer for that, but I feel like, you know, investing minimally at the moment and supporting the top use cases. And then also most teams is still just allowing.

Shane Murray

You know, teams out on the edge to go and use what they want. But it's like, how do you make sure you've supported the top three use cases?

Shane Murray

This is how I've approached.

Shane Murray

Platforms in the past. You don't try to sell for %. You don't try to enforce too much, but you have.

Shane Murray

You know, you're.

Shane Murray

% of a value covered by by the platform. I think modularity.

Shane Murray

Is.

Shane Murray

Is probably, a key word for this now. And, and just making sure what you build is extensible to different use cases.

Richie Cotton

Okay. Yeah. I like the idea of modularity. Like maybe you don't want some monolithic stack that, you got to swap everything out all at once. Okay. So, you talked before about, observatory and like, how you test for quality and and you, I, I guess this all starts with, with the data. Right? So what sort of data quality controls you're going to want before you start saying, okay, we're going to use this data with our AI products.

Shane Murray

So features I think I readiness. It also has.

Shane Murray

This category of kind of I already data.

Shane Murray

And I think for, for some teams as I've mentioned, like it's about actually.

Shane Murray

Building the application and and seeing what needs to be ready and that process.

Shane Murray

I think also we have this huge.

Shane Murray

Branch.

Shane Murray

Of work, many teams now.

Shane Murray

Approaching conversational BI or whatever name you want to want to give to it, but basically replacing dashboards with natural language and and allowing that to access your data.

Shane Murray

I think for that, teams.

Shane Murray

Really need to and have been thinking about kind of data certification. Right. Which which data sets have.

Shane Murray

The right.

Shane Murray

Coverage in terms of monitoring, have the right incident response processes, have the right.

Shane Murray

Metadata around them to be ready, and that that.

Shane Murray

With AI tends to include synonyms like, you know, revenue equals sales and all the ways that.

Shane Murray

Someone using.

Shane Murray

The agent is going to, to talk about your data.

Shane Murray

And, and then, you know, essentially, I'd say that's kind of one side is, is have you.

Shane Murray

Prepared your data estate to have.

Shane Murray

Agents put on top of it? And then I've also seen data teams.

Shane Murray

Who are kind of tackling this idea, where they have downstream teams that are building agents, and they need to think about how they approach kind of readiness of structured and unstructured data. And the unstructured data is kind of a new component where.

Shane Murray

You know, you previously.

Shane Murray

Had very little visibility on that data. So I think teams are now grappling with the idea of, of how do I monitor the unstructured data that that may be used by, you know, tens of teams downstream?

Richie Cotton

I see that's a very interesting point. So how do you even measure like the quality of like, is this image any good or not?

Shane Murray

So I feel like it's early on, but there's some things that are.

Shane Murray

Parallel from structured data obviously. So is it fresh? You know, is it in a valid file format?

Shane Murray

There's some of the things.

Shane Murray

That are kind of fairly standard with.

Shane Murray

That. But then I think you've got an image is a.

Shane Murray

Maybe a harder one. But I think of text like semantic consistency.

Shane Murray

And I've seen a lot of teams try to.

Shane Murray

Protect for this by having like, one person contribute to the to the governance of, of the documents to ensure this semantic consistency.

Shane Murray

But if you've got really high volume variable.

Shane Murray

And high.

Shane Murray

Velocity data, then you need to be able to check that you know, someone's.

Shane Murray

Definition of risk is the same definition of risk that's understood across the corpus. Right. And so I.

Shane Murray

Think there's a need to.

Shane Murray

Sort of understand the semantic consistency.

Shane Murray

Of the corpus. And then there's also a need to.

Shane Murray

Understand the the relevance of it.

Shane Murray

And I've talked to a lot of.

Shane Murray

Teams that are dealing with kind of drag based pipelines, where they.

Shane Murray

Need to be able to know.

Shane Murray

When there's a drift in the relevance of the underlying corpus from the, from the model and the output that they've built.

Richie Cotton

Okay. So, just to make sure everyone said this idea correctly. So semantic consistency is this just like business definitions are consistent. So it's like okay this is how we define the sales qualified lead or this is how we define customer lifecycle value. And it's the same everywhere. And you want that across all your documents is that is a code.

Shane Murray

Exactly.

Richie Cotton

Yeah. Okay. So is there like a technological solution to ensuring semantic consistency or is it like a case of just read everything and I'd hope that is right.

Shane Murray

Yeah. I mean, in Monte Carlo we've done some work into into building.

Shane Murray

These unstructured data.

Shane Murray

Monitors and.

Shane Murray

So, so you know, I think we're we're sort of starting to work with teams to see like what they need to monitor and, and what they need to extract.

Shane Murray

But it's a different problem. And you basically need.

Shane Murray

To structure some information from that unstructured data in order to monitor for that consistency.

Shane Murray

So and and much of this.

Shane Murray

Relies on the fact that now we have like underlying line functions in our, in our warehouse that we can tap into.

Richie Cotton

Okay. Yeah. So it seems like maybe it's a sort of ongoing research problem is like how do you actually ensure like that document quality control like everywhere. I like to have you find maybe like a concrete example of how the, the data quality then feeds into AI quality. Like how does that relationship work?

Shane Murray

You know, I tend to think of.

Shane Murray

Of this kind of engineering.

Shane Murray

Maxim.

Shane Murray

Like as much reliability as.

Shane Murray

You need to know more.

Shane Murray

Right? Maybe this stems back to the idea of like, you kind of have to build the thing and put it in production and start learning and and.

Shane Murray

So these the, the idea of monitoring.

Shane Murray

These things is often one through monitoring and iteration.

Shane Murray

How does it affect performance? I think oftentimes people are.

Shane Murray

Really attaching to this idea of hallucination. Right. And it's it's kind of the most commonly talked about problem with generative AI.

Shane Murray

What I've found.

Shane Murray

From talking to to many data teams.

Shane Murray

Is that very often.

Shane Murray

Like the underlying problem that's causing hallucination.

Shane Murray

Is not the.

Shane Murray

Model itself, it's actually the context or the data inputs. Right. And so if you have outdated data right, then you're going to have a model that's giving potentially like old information.

Shane Murray

Or if you if you've just tapped.

Shane Murray

Into your underlying.

Shane Murray

Like.

Shane Murray

Confluence data, that's that's had maybe hundreds of contributors over the last ten years.

Shane Murray

You're going to have a mix of.

Shane Murray

High and low quality information.

Shane Murray

And so I think the practical well, the models can hallucinate.

Shane Murray

And it's certainly something you need to.

Shane Murray

To, you know, build monitors and guardrails on. The practical problem that many.

Shane Murray

Teams are finding is actually like, all right, how do I ensure the the data is.

Shane Murray

Reliable and.

Shane Murray

How do I ensure that the model outputs are reliable?

Richie Cotton

Oh, right. So we're back to a document quality control where like you've got a five year old. Well I mean maybe customer facing support pages that they ought to be kept up to date. But certainly like intranet pages where it's like okay. Yeah, I wrote this process seven years ago, a page still exists somewhere, and suddenly it's being pulled in.

Richie Cotton

And yeah, I, helped system.

Shane Murray

Well, I yeah, I mean, I think some of the early use.

Shane Murray

Cases have been can.

Shane Murray

We put a, can we put a.

Shane Murray

Conversational agent on top of our internal documentation?

Shane Murray

And I think people are very.

Shane Murray

Quickly fine finding that that isn't, a very up to date document source.

Shane Murray

I mean, obviously.

Shane Murray

There's, there's cases outside of that where you can tap into much more fresh and timely documents.

Richie Cotton

And are there any of the sort of data quality issues you see then, or is that is that the moment. Yeah.

Shane Murray

No, I, I think I mean, the data quality, the data quality issues that contribute, many of them.

Shane Murray

Are things that we've seen, you know, across.

Shane Murray

Traditional ML and.

Shane Murray

Across analytical.

Shane Murray

Use cases. Still, I.

Shane Murray

Find like silent schema changes from from an upstream, say, software engineering.

Shane Murray

Team.

Shane Murray

One of the most common causes of pain are even externally managed data that that you're ingesting that actually changes.

Shane Murray

And so schema changes is still this sort of, event that, if not controlled, can.

Shane Murray

Be like disastrous for, for downstream.

Shane Murray

Systems. I think then you.

Shane Murray

Have the idea of pipeline delays causing stale or incomplete data, because talking to a customer who said.

Shane Murray

Like, we need to know the delay between when the underlying document.

Shane Murray

Set is updated.

Shane Murray

And when our model actually starts to.

Shane Murray

Use that.

Shane Murray

Because this time in between, this.

Shane Murray

Latency.

Shane Murray

Can can cause it to be.

Shane Murray

Delivering the wrong answers to consumers.

Shane Murray

Right.

Shane Murray

And so there's a there's still a freshness problem that exists in AI, as with any data product that you're building.

Shane Murray

I think thirdly, you have this.

Shane Murray

Idea of data.

Shane Murray

Drift, which can.

Shane Murray

Occur from.

Shane Murray

Code changes or.

Shane Murray

Instrumentation changes, but the idea that your your underlying data that you're using in a pipeline or in a in context can actually drift into a place where the model no longer is giving relevant.

Shane Murray

Responses and then I think the new one is actually like measuring the the actual output.

Shane Murray

Of the AI. Right. And it could be measuring that for, for clarity of, a lot of teams I took to measure it. So like, does it carry the kind of brand image that we represent. Like is it speaking.

Shane Murray

As we would speak to customers, or is it grounded in the the context.

Shane Murray

That we've provided it, or is it accurate if you have some ground truth?

Shane Murray

And so all of these things, I think.

Shane Murray

From the underlying data.

Shane Murray

Inputs.

Shane Murray

Through to, you know, which.

Shane Murray

Can be schema changes and freshness through to the the actual model outputs.

Shane Murray

So the quality issues that are affecting AI in production.

Richie Cotton

So many different there things that can go wrong there. So yeah certainly with with freshness I can see like if you're pulling in like weather data or news data like even a small delay is like that's going to render the data completely useless. And certainly like changing formats like that. You mentioned the schema changes. Yeah. If your data is only in a different form, then that's going to break a lot of things downstream.

Richie Cotton

Okay. So I guess, a lot of things that can go wrong. Suppose like you achieve different phases, like, okay, we need to improve, data quality. Do you have like a single recommendation for like where do you start improving data quality?

Shane Murray

This is something we, we tackle at Monte Carlo a lot. And I'd say like the.

Shane Murray

Most logical place.

Shane Murray

To start and the place of highest impact.

Shane Murray

Typically at least.

Shane Murray

In large.

Shane Murray

Organizations and, and many.

Shane Murray

Small organizations, you'll have.

Shane Murray

The.

Shane Murray

The kind of.

Shane Murray

Foundational data products, right. The core.

Shane Murray

Data layer that.

Shane Murray

Should be easily join a bull.

Shane Murray

That everyone's tapping.

Shane Murray

Into.

Shane Murray

And they're building derived.

Shane Murray

Products on top of that.

Shane Murray

So you might have a domain teams that are.

Shane Murray

Actually tapping into that data. So, you know, I tend to.

Shane Murray

Recommend starting with those foundational data because it's it's where you might have hundreds of uses that you can affect downstream. And if you.

Shane Murray

If you improve the.

Shane Murray

Quality there, you improve it downstream.

Shane Murray

And so typically,

Shane Murray

I'd start with, you know, the base kind of.

Shane Murray

Schema,

Shane Murray

Freshness.

Shane Murray

Volume.

Shane Murray

Checks that you can scale up very easily. Right. And you don't need to be writing manual assertions. You can really just turn these type of checks on, with any observability tool.

Shane Murray

And then going going deeper.

Shane Murray

On.

Shane Murray

Your critical data elements.

Shane Murray

And ensuring you have kind.

Shane Murray

Of distribution monitoring.

Shane Murray

Around those.

Shane Murray

So that you know that by the.

Shane Murray

Time that it gets to a metric that the CEO is.

Shane Murray

Reading or.

Shane Murray

A piece of data that's that's really required, and then use.

Shane Murray

System that it's.

Shane Murray

Actually been measured along the way and hasn't had any transformation problems or any underlying data problems.

Shane Murray

So I think that's the the detection side. And then we typically encourage.

Shane Murray

Teams to really start with, you know, basic operational practices. If they don't have those, which might include.

Shane Murray

On call.

Shane Murray

Rotations.

Shane Murray

Having it clear and understood.

Shane Murray

Severity process for incident so that you can kind of separate the signal from the noise and really amplify the the signal of a seven or a Sev two.

Shane Murray

And then doing retros.

Shane Murray

When you have these incidents and, and kind of reporting on things like time to detect and time to respond.

Shane Murray

So these are kind of the, the basic practices that I think every, you know, software engineering.

Shane Murray

Team is implemented and every data team needs to have implemented these days.

Richie Cotton

So that's, simple to get data to. So I think that, yeah. So the idea of having people on call for when things go wrong, and doing very proactive, like a postmortem when there's a disaster so you can fix it. Very common in software engineering, much less so in data teams. So, yeah. That definitely seems like a good process to implement.

Richie Cotton

So, yeah, I guess if you're a data team leader, you're like, how do I do things about, just basically copy whatever the software engineering team is doing?

Shane Murray

I think that, I mean, we don't need to.

Shane Murray

Reinvent the wheel here. Like, I think data teams have been behind software engineering teams.

Shane Murray

In most.

Shane Murray

Organizations, just in terms of adopting these strong like reliability engineering practices.

Shane Murray

But like data just keeps.

Shane Murray

Becoming a more central part of the products we build. And it it's.

Shane Murray

So these.

Shane Murray

Are the steps to take.

Richie Cotton

Absolutely. Okay, so you mentioned earlier on about you go beyond just observing problems to try and find, like what the root cause of those problems was, just walk me through. How do you go about that? I think, like, once you got, like, an I agent and then, like, lots of layers of tech in between that and, oh, there's a data problem.

Richie Cotton

It can go quite deep, I guess. So, yeah. Talk me through. What's the process for finding a root cause of a problem?

Shane Murray

So maybe I'll, I'll.

Shane Murray

Separated into.

Shane Murray

Two.

Shane Murray

Pieces, which is kind of how we.

Shane Murray

Observe.

Shane Murray

And how we observe agents. And so.

Shane Murray

We've been doing work at.

Shane Murray

Monte Carlo to basically extend our agent observability so that we're capturing the traces of agent behavior. You know, through.

Shane Murray

All the.

Shane Murray

Building blocks, which you can think of is kind of let's take a rag pipeline. It would be, you know, chunking and embedding and retrieving. And then you've got the decisions that an agent might be making. And so.

Shane Murray

You know, as these agent architectures get more complexity built into them, we've just seen the need to.

Shane Murray

Extend that kind of telemetry instrumentation to be able.

Shane Murray

To to go and, you know, dig into individual responses.

Shane Murray

Right. Because you've got nondeterminism built into these, you need to actually be able to break it down and look at individual cases and understand essentially the lineage of those.

Shane Murray

And so part of it is that instrumentation. And then the other part is actually we've you know, over the years we've we've.

Shane Murray

Really invested in in troubleshoot and root cause analysis.

Shane Murray

And what we've built up.

Shane Murray

In Monte Carlo now is, is an agent for troubleshooting.

Shane Murray

And so what that agent does is it takes all of the context.

Shane Murray

Of an incident.

Shane Murray

And which, you know, includes the the anomaly itself, but.

Shane Murray

Also an understanding of the data lineage and understanding of the the different logs that are coming from.

Shane Murray

Contributing tools. And then it spawns a series of sub agents which might explore, you know, GitHub changes or airflow issues, or go upstream and look at all the.

Shane Murray

Potential data failures that could be happening.

Shane Murray

So each of these specialist agents then go and explore hypotheses. And you might have or 

Shane Murray

Lims running in parallel.

Shane Murray

To explore a hypothesis, come back with a finding, share that with the.

Shane Murray

Main agent who then summarizes that within the span of about two minutes and suggests next steps.

Shane Murray

And so that's really the latest and greatest that we're doing on troubleshooting, which is like, how do you take what we've.

Shane Murray

Found with data? Teams can take.

Shane Murray

Kind of or 

Shane Murray

Hours of for an incident to actually go and.

Shane Murray

Troubleshoot. And with any alert, give in two minutes.

Shane Murray

A review of what might have caused it.

Richie Cotton

A value that using, like hundreds of agents to then find out what the problem was with the original agents. So I've got like more problems after you debug debug those agents.

Shane Murray

It's, I will say, like, we've found it to be such a good use case because there's there's such a clear process.

Shane Murray

That engineers take to this hypothesis testing.

Shane Murray

And we've also, over the six years of Monte Carlo, built up an understanding of, you know, all the root causes that we can kind of give hints to.

Shane Murray

This agent about potential. But yeah, it's,

Shane Murray

Agents.

Shane Murray

Upon agents, as you say.

Richie Cotton

Not it does seem incredibly useful. I mean, I imagine, like spending hours going through, like, digging through your entire stack to figure out where the problem is. That's going to be an incredibly tedious and frustrating job for, for any sort of data scientist or, engineer. So, yeah, I like the idea of automating that. So you mentioned you've got sort of a six but six years worth of data on like things that could go wrong.

Richie Cotton

Tell me. Yeah. What are the most common sort of root causes of problems that you found?

Shane Murray

And maybe I'll.

Shane Murray

Focus on, like.

Shane Murray

As we've.

Shane Murray

As we've top two teams building AI. What some of the new problems there.

Shane Murray

Are.

Shane Murray

Obviously I kind of talked through like schemer and freshness and distribution problems.

Shane Murray

In data that can occur. But one of the interesting.

Shane Murray

Ones I've found is talking to teams. Building AI is the actual.

Shane Murray

You know, underlying model.

Shane Murray

Upgrades behind the scenes, right? Or prompt changes.

Shane Murray

And, and typically you'd think like these are occurring within teams that should know that changes.

Shane Murray

That are happening.

Shane Murray

But a lot of these changes are currently invisible. And I think part of it's because we're early. But, I changed from GPT to to right.

Shane Murray

Can really have dramatic impacts on an agent that you have in production. Right. And so.

Shane Murray

I I've heard teams.

Shane Murray

Say like they'll start getting feedback from the user base that somehow like the thing feels less.

Shane Murray

Useful. And we see.

Shane Murray

This using some of the that the UI is provided by, by the agents as well.

Shane Murray

But, you know, data teams need to.

Shane Murray

Be aware of those model changes or prompt changes in the same way that they are a schema change.

Richie Cotton

Yeah. All these, foundation model companies that keep coming up with new models and they're always like, oh, this is the latest, greatest thing you need to adopt this. But Ashley must be very, very careful in an enterprise setting when you're switching up your models, it seems.

Shane Murray

Yeah, I, I even if it's better, I think we've seen as data teams.

Shane Murray

Like.

Shane Murray

You care.

Shane Murray

About.

Shane Murray

Consistency in a way above like accuracy.

Shane Murray

Sometimes consistency is such a critical factor for data teams.

Shane Murray

And so a model changing versions.

Shane Murray

Behind the scenes or someone switching out a prompt.

Shane Murray

Has huge impacts.

Shane Murray

On the consistency of behavior for these.

Shane Murray

So that's that's been one I think another thing that has come.

Shane Murray

Up with a lot of data teams is, is this idea of embedding drift.

Shane Murray

Right, which you could frame.

Shane Murray

As kind of knowledge.

Shane Murray

Drift. But.

Shane Murray

You know, my embedding still.

Shane Murray

Relevant to the, to the use.

Shane Murray

Case that I'm.

Shane Murray

Supporting. And then I just say going back, like the thing that's.

Shane Murray

Being reinforced through my conversations is if that underlying knowledge base or document set that you're feeding in is low quality or is delayed, you know that that's going to to make or break your AI application.

Richie Cotton

So embedding drift this is a whole new problem to me. Toby, through what this is about like the the meanings of words changing. When does this happen?

Shane Murray

Yep. Potentially I'll maybe I'll give a sort of.

Shane Murray

Example that that a customer shared.

Shane Murray

But but basically they were doing some.

Shane Murray

Work in the US and had some embeddings that were specific to the US.

Shane Murray

And then starting to export.

Shane Murray

Expand the business into Canada. Right. And and the standards of, of the language of the images, in this case in Canada, were very different from the US. Right. And so.

Shane Murray

Their techniques, which might have been like few shot prompting or.

Shane Murray

Other techniques to actually make.

Shane Murray

This work, suddenly become less.

Shane Murray

Reliable for the for the scope of the problem they're solving.

Shane Murray

And so I think historically, like these sorts of.

Shane Murray

Solutions are being built by sometimes one data scientist. Right.

Shane Murray

But as you start.

Shane Murray

To fragment.

Shane Murray

Ownership of that solution.

Shane Murray

I think you you have more requirements to have observable or have monitoring on each piece of it.

Richie Cotton

Yeah. That's, absolutely fascinating. Certainly something I've fell over in my own life moving from for me to the US is like, oh yeah, a lot of these words are wrong for.

Shane Murray

Me to, moving from from Sydney to New.

Shane Murray

York, experienced exactly the same thing.

Richie Cotton

Absolutely. Yeah. So, it seems like, as you're sort of moving into new markets, you really, really need to be careful about like, AI performance and and is that can that change? Okay. So I guess in general, all this speaks to the problems of just, how do you go about changing your, your processes, like once you start adopting agents, do you have any, advice on, process management?

Richie Cotton

Like how do you go about, changing things?

Shane Murray

I think maybe the, the first one I'd just call out, maybe it's obvious, but like, human in the loop, I think is.

Shane Murray

Essential. Like, I.

Shane Murray

Actually probably wouldn't be building a solution.

Shane Murray

Without having factored in some human in the loop to.

Shane Murray

It. And I think both,

Shane Murray

As like a quality gate.

Shane Murray

But also to build that.

Shane Murray

Trust with users, you know, the ultimate consumers of it who who have a say in it being reliable.

Shane Murray

And I think when teams set.

Shane Murray

The expectations with their end users, they're in a much better place. And and also bringing in experts to ensure the quality of the.

Shane Murray

Application.

Shane Murray

Then that then they're, you know, actually, more successful in deploying these applications and, and driving that change management with their user base. I even saw, you know, the same thing with machine learning is that, like the closer you get to that end user and build for them and don't build to replace them, then then you're going to get a lot more buy in.

Shane Murray

I think the second one is like the the, the thing.

Shane Murray

I've seen teams do is kind of stop very narrowly. You know, prototype narrowly, get the buying, get the wins, then figure out the next scale. So they're going from to to a thousand users.

Shane Murray

In a way that I don't think we.

Shane Murray

Considered as much with.

Shane Murray

You know, we certainly.

Shane Murray

Did sort of a smaller AB test.

Shane Murray

But most products.

Shane Murray

You know, get launched to production without going through such rigorous kind of step ups in, in the user base. And I think sometimes that'll mean you, you go up and then you come back to your ten users and test some new changes. But it really feels like that sort of phased rollout is part of, the AI adoption process culturally.

Shane Murray

What what we've.

Shane Murray

Found that Monte Carlo because of course, like any other company where adopting AI is, you need to give people kind of space and time to experiment. You're not going to be productive on day one of using AI, and you're going to.

Shane Murray

Do a lot wrong. And so I've found that really useful.

Shane Murray

At Monte Carlo, we have a culture of like sharing the successes and failures we've had with using AI. And and that's actually converted a lot of people in into uses, who were previously maybe a bit shy to use, the technologies.

Richie Cotton

The idea of the phase rollout, that's actually surprisingly revolutionary or radical, I suppose, compared to, like SAS software, like, okay, we do an AB test, we give it, we give a new feature to % of our users. If it seems to work okay, we roll out to everyone. But yeah, doing it like ten uses that than a thousand and gradually building up that.

Richie Cotton

That does seem, it's a very different process and a lot more rigorous.

Shane Murray

And I think in part it ties to maybe, maybe the weakness of human in the loop is that right.

Shane Murray

Now and, and I've seen some surveys that have shown this to back up the anecdotes I have.

Shane Murray

But human in the loop is the.

Shane Murray

Primary way people are ensuring quality. And that's very hard to scale up.

Shane Murray

So you have to start thinking about how to, you know, keep that, keep that.

Shane Murray

At, at

Shane Murray

Levels that are manageable.

Shane Murray

As you scale from to to 

Shane Murray

You can't keep.

Shane Murray

Scaling up your humans in the loop. So you have to scale up more automated evaluation and monitoring approaches.

Richie Cotton

Okay. Yeah, certainly. Yeah. Humans seem to be the bottleneck. Everything like, All right. So I guess all this has been building towards how do we get, uses of, of AI use of AI agents to trust the product. So, Jeremy, final advice on, like, how do you get that trust?

Shane Murray

Yeah, I, I had and.

Shane Murray

Data scientists say to me the other day that trust is gained in droplets and lost in buckets. I'm not sure who get who gets credit for that, but I thought it was a really good phrase.

Shane Murray

I think what we've found over, over the.

Shane Murray

Span of Monte.

Shane Murray

Carlo is that there's both what we can do to, to.

Shane Murray

You know, use.

Shane Murray

Observability.

Shane Murray

To, to make more reliable data.

Shane Murray

Products. But then there's a whole human.

Shane Murray

Element to this and to rolling out any software that ensures the trust. And and so I've seen three things really drive trust in data. And I think it extends to AI. But one is like you have clear accountability, right. You have clear accountability over the data. You probably want to avoid too much fragmentation across the value chain of the product.

Shane Murray

So you know, you know, who's responsible for the lifespan of this.

Shane Murray

Then you have an.

Shane Murray

Expectation of what quality and reliability means. So, you know, actually sharing that expectation, whether it's for nines of reliability.

Shane Murray

Or whether.

Shane Murray

It's something else.

Shane Murray

And and.

Shane Murray

As I've spoken to people that are launching AI products, they've found if they don't set that expectation upfront.

Shane Murray

There's missteps.

Shane Murray

With their users who maybe expect something else. So like, you.

Shane Murray

Need to know, you know, how.

Shane Murray

How trustworthy and high quality this product is that you're adopting.

Shane Murray

And then the third one actually is like, you know, we.

Shane Murray

See when when customers do go through data incidents, their ability to transparently communicate that to their end users, right, and communicate uptime and downtime is something that also builds trust, right.

Shane Murray

So, you know, people.

Shane Murray

Understand downtime of products, but if you don't communicate it well, it can it can be a failure. And in that trust building.

Shane Murray

So those are probably the three things. And you know I think.

Shane Murray

It's it just reinforces that that trust is kind of built in in production.

Richie Cotton

I really like the idea that, as in real life, a lot of trust is, is about communicate setting, realistic, performance, I guess. So yeah, if you don't have % reliability, then, you know, make it clear to users that maybe some things are going to go wrong some of the time. They shouldn't be looking to trust it, to be right on every single location.

Richie Cotton

All right. Super. And just that finally, I always want, more people to follow. So, whose research are you most interested in? At the moment?

Shane Murray

Yeah, I've. I've been, reading a lot of, And I'm just I'm just thinking.

Shane Murray

I'm not sure I know his full name, but.

Shane Murray

Hamel.

Shane Murray

Hamel and his website is hamel.dev. Dev.

Shane Murray

And he's an engineer who.

Shane Murray

Talks and writes a lot about building reliable AI. And so I've been following along with him. I think he he's writing some really interesting stuff around, you know, tackling error analysis of of agents and, and building evaluations.

Shane Murray

So I've, I've been enjoying him and then I'd say the other.

Shane Murray

The book I'm reading at the moment, is Empire of AI. Have you heard of that?

Richie Cotton

I have read the book is a very, very good book. Lots of gossip about what's been going on that over the last decade. Yeah. Is a good read.

Shane Murray

Yeah.

Shane Murray

So that's Karen Hao, who's a freelance journalist, but, yeah.

Shane Murray

I think she.

Shane Murray

Started by being behind the scenes in with, with OpenAI. And it it kind of builds from there. The good and the bad, I'd say.

Richie Cotton

Yeah. Yeah. Lots of, very, very juicy gossip in that book. All right. Super, thank you so much for your time, Shane. It's been a pleasure.

Shane Murray

Thanks so much. Richie.

Sujets
Apparenté

podcast

Building Trustworthy AI with Alexandra Ebert, Chief Trust Officer at MOSTLY AI

Richie and Alexandra explore the importance of trust in AI, what causes us to lose trust in AI systems and the impacts of a lack of trust, AI regulation and adoption, AI decision accuracy and fairness, privacy concerns in AI and much more.

podcast

Developing AI Products That Impact Your Business with Venky Veeraraghavan, Chief Product Officer at DataRobot

Richie and Venky explore AI readiness, aligning AI with business processes, roles and skills needed for AI integration, the balance between building and buying AI solutions, the challenges of implementing AI-driven changes, and much more.

podcast

The Challenges of Enterprise Agentic AI with Manasi Vartak, Chief AI Architect at Cloudera

Richie and Manasi explore Al's role in financial services, the challenges of Al adoption in enterprises, the importance of data governance, the evolving skills needed for Al development, the future of Al agents, and much more.

podcast

Enterprise AI Agents with Jun Qian, VP of Generative AI Services at Oracle

Richie and Jun explore the evolution of AI agents, the unique features of ChatGPT, advancements in chatbot technology, the importance of data management and security in AI, the future of AI in computing and robotics, and much more.

podcast

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Sanjay and Richie cover the shift from experimentation to production seen in the AI space over the past 12 months, how AI automation is revolutionizing business processes at GENPACT, how change management contributes to how we leverage AI tools at work, and much more.

code-along

Building Trustworthy AI with Agents

Shingai Manjengwa, the Head of AI Education at Theoriq (ChainML Labs), will discuss the principles of responsible AI and demonstrate how they may be implemented in a world with multiple collaborating agents.
Shingai Manjengwa's photo

Shingai Manjengwa

Voir plusVoir plus