Shane Murray is a seasoned data and analytics executive with extensive experience leading digital transformation and data strategy across global media and technology organizations. He currently serves as Senior Vice President of Digital Platform Analytics at Versant Media, where he oversees the development and optimization of analytics capabilities that drive audience engagement and business growth. In addition to his corporate leadership role, he is a founding member of InvestInData, an angel investor collective of data leaders supporting early-stage startups advancing innovation in data and AI. Prior to joining Versant Media, Shane spent over three years at Monte Carlo, where he helped shape AI product strategy and customer success initiatives as Field CTO.

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Key Quotes
I have an underlying philosophy that you can't truly establish readiness until you start building. And that is the best way to be ready. Working with teams who are building AI, the ones that are actually going through the prototype to production to scale up phase are the ones that are finding out what readiness really means for their organization.
Human in the loop is essential. I wouldn't be building a solution without having factored in some human in the loop. Both as a quality gate, but also to build that trust with users, who are the ultimate consumers of your solution, who have a say in whether it is reliable.
Key Takeaways
Adopt a phased rollout approach for AI applications, starting with small user groups and gradually scaling, to manage quality and build trust incrementally.
Implement human-in-the-loop processes to maintain quality and trust in AI applications, while planning for scalable automated evaluation methods.
Focus on context engineering to provide reliable and trusted data inputs for AI agents, ensuring semantic consistency and relevance across data sources.
Transcript
-
Shane Murray
I kind of have an underlying philosophy.
-
Shane Murray
That you can't truly.
-
Shane Murray
Establish readiness until you start building, and that is the best way to be ready.
-
Shane Murray
Actually, from working with teams who are building I the ones that are.
-
Shane Murray
Actually going through the prototype to production to scale.
-
Shane Murray
Up phase, are the ones.
-
Shane Murray
That are finding out what readiness really means for their organization.
-
Shane Murray
Human in the loop.
-
Shane Murray
I think is essential.
-
Shane Murray
I actually probably wouldn't be building a solution.
-
Shane Murray
Without having factored in some human in the loop to.
-
Shane Murray
It. And I think both are.
-
Shane Murray
As like a.
-
Shane Murray
Quality gate, but also to build th... See more
-
Shane Murray
Trust with users, the ultimate consumers of it, who have a say in it being reliable.
-
Richie Cotton
And welcome to date framed. This is Richie. For all the promise of AI agents, a major barrier to them being useful is that they don't always work. That is, they are untrustworthy. Today we're going to look at ways to improve trust in your AI agents with a big focus on data quality. I also want to discuss AI readiness.
-
Richie Cotton
Every CEO claims they want their company to be AI ready, but I'm not really sure what that means. In this episode, we have a repeat guest, Shane Murray. At the time of recording, he was Field Chief Technology Officer at the data and AI observability platform Monte Carlo. He's just switched jobs to be senior vice president of digital platform analytics at Versant Media.
-
Richie Cotton
In fact, he's returning to his media roots since he was previously senior vice president of digital analytics at The New York Times. So let's find out how to build trust in AI agents. Hi, Shane. Welcome to the show.
-
Shane Murray
Hi, Richie. Nice to see you.
-
Richie Cotton
Wonderful. Yeah. Great to have you here. Now, I love a good disaster story. So, just to begin with, can you tell me what's the biggest AI disaster you've seen?
-
Shane Murray
From my previous days as a data. Later, the New York Times, we talked about the disasters is those that might appear on the front page of the the New York Times. As many teams.
-
Shane Murray
I think, do I you know, I would say from the people I work with, people.
-
Shane Murray
Are generally fairly conservative. So I, I haven't been firsthand to disasters.
-
Shane Murray
But the ones that kind of come to mind,
-
Shane Murray
I don't know if you saw there was a.
-
Shane Murray
McDonald's chat bot that leaked.
-
Shane Murray
I think it was like million job applicant records. And the security researchers found.
-
Shane Murray
That the default password was.
-
Shane Murray
-
Shane Murray
Six. Comically so. I mean, I think these reputational ones are the ones you.
-
Shane Murray
You really worry about. And similarly, the one that comes to mind just from having been in the field of media, but I think it was the Chicago Sun-Times who had a summer reading list that was essentially all fictional books.
-
Shane Murray
Right? You know, not fiction books, but actually fictional books. And so so these ones that kind of get to the heart.
-
Shane Murray
Of reputation and trust, I think of the other ones you watch out.
-
Richie Cotton
For. Yeah, absolutely. It's all very well. Having, like a forecast is a little bit off, but when you're causing like, problems that million customers of yours, that's a really kind of it's a bad situation, something you definitely want to avoid. Maybe that's too negative. So we need the flip side to this. Told me through some success stories you've seen.
-
Shane Murray
Yeah, I, maybe reference, one I've.
-
Shane Murray
I've read about. And then one, two and one I've seen firsthand just working with customers here at Monte Carlo.
-
Shane Murray
But, the one that struck me, just.
-
Shane Murray
Like reading what data.
-
Shane Murray
Teams are doing.
-
Shane Murray
Is the stripe.
-
Shane Murray
Team. And I haven't actually seen like, a, a.
-
Shane Murray
Deeply published post on this, but one of their team members was talking about how they've.
-
Shane Murray
Essentially built.
-
Shane Murray
A foundational model.
-
Shane Murray
For payments using.
-
Shane Murray
Billions of transactions.
-
Shane Murray
Which is essentially replacing, you know, feature engineering. You know, what we call now traditional ML models with embeddings.
-
Shane Murray
Capturing the relationship between payments.
-
Shane Murray
And I just thought that was super impressive. And, you know, most people tend to talk about traditional AI, ML.
-
Shane Murray
And then.
-
Shane Murray
Generative AI. And the fact that that they're using these LM approaches to rethink these.
-
Shane Murray
These sort of foundational business.
-
Shane Murray
Problems.
-
Shane Murray
Around, you know, fraud detection and things like that, I just found super interesting.
-
Shane Murray
Yeah. Have you seen that one? I.
-
Richie Cotton
No, I hadn't heard about this does actually kind of wild because. Yeah. You think about machine learning and generative AI as being very separate things, but actually, yeah, an embedding model to it's nominees for like detecting how close different words are mean to each other. So I guess detecting how close payments are. That seems like a good way to detect fraud.
-
Richie Cotton
So yeah, very innovative there.
-
Shane Murray
Yeah. Yeah definitely that one. And then I just say like from from customers I work with where I've seen the most value, even though it's not maybe.
-
Shane Murray
The.
-
Shane Murray
Like the autonomous agent use case is these teams that are basically.
-
Shane Murray
Taking what, you know, what what kind of robotic process automation steps and, and.
-
Shane Murray
You know, unstructured to structured data.
-
Shane Murray
And so an example customer I work with pilot travel centers who have, you know, travel centers in the US. They're the biggest diesel fuel provider. They're the biggest subway franchisee in the US.
-
Shane Murray
And they actually took these bills of lighting that the truck.
-
Shane Murray
Drivers essentially take photographs of and send in. And then it goes into this what was a previously human driven processing loop.
-
Shane Murray
And used our.
-
Shane Murray
Lambs to extract the data from those images and then, dramatically speed up their kind of financial data pipelines, which I.
-
Shane Murray
Thought, I think that's become like a prototypical use case, but it, you know, super.
-
Shane Murray
Interesting one and and really valuable to organizations.
-
Richie Cotton
Yeah, certainly it hits on like a lot of the kind of current themes like the idea that, well, okay, images and now data, you can extract very useful, structured information from them. And then also like one of the big goals is like, okay, let's speed up some processes, automate stuff and then we can go faster. All right.
-
Richie Cotton
Nice hitting all the key themes in one story. I like that I want to talk about how we get to these, success stories. So I think, a lot of organizations, one of the big sort of management buzzwords has been about becoming AI ready. I'm still not quite sure what that means. So talk me through what is being I really mean to you?
-
Shane Murray
Yeah, I think it can.
-
Shane Murray
Be a very ambiguous term that that goes without.
-
Shane Murray
Definition often. And, and.
-
Shane Murray
Probably similar to terms of the past, these kind of terms about in, in data in business.
-
Shane Murray
But, you know, I think I kind.
-
Shane Murray
Of have an underlying.
-
Shane Murray
Philosophy that you can't truly establish.
-
Shane Murray
Readiness until you stop.
-
Shane Murray
Building.
-
Shane Murray
And that is the best way to be ready. I know that's kind of a cheat.
-
Shane Murray
And so but like actually from working with teams who are building AI, the ones that are.
-
Shane Murray
Actually going through the prototype to production to scale up.
-
Shane Murray
Phase are.
-
Shane Murray
The ones that are finding out, you know, what readiness really means for their organization.
-
Shane Murray
And that can be organizational. That can be.
-
Shane Murray
From a.
-
Shane Murray
Systems perspective, you.
-
Shane Murray
Know, of a.
-
Shane Murray
Cloud first. Do they have.
-
Shane Murray
The right.
-
Shane Murray
You know, management around.
-
Shane Murray
Unstructured data, that.
-
Shane Murray
Sort of thing. And then I'd say just like more, more practically.
-
Shane Murray
Where I see teams focused on in this space. One is do they have reliable I read data, I think everyone's kind of accepted this idea that for your AI to be good and ready, your data has to be ready. So that's kind of one dimension.
-
Shane Murray
The other is.
-
Shane Murray
This thing we talk about with customers, which.
-
Shane Murray
Is once you, you know, our building.
-
Shane Murray
I do you have the means to evaluate and monitor the quality of that. As you're scaling it out. You know, the quality of the outputs, the quality of the process to build it.
-
Shane Murray
And then thirdly is kind of, you know, when the agent maybe goes askew, do you have the.
-
Shane Murray
Ability to to understand it and to intervene. Right, and to prevent bad outcomes.
-
Richie Cotton
That's interesting because, those ideas are kind of at the different ends of the drum cycle. Like, I love the idea that in order to figure out, like, what you don't know, you just try building stuff and where if you fail, it's going to be useless. Information like this is what we need to learn about. This is what we need to do better.
-
Richie Cotton
But then also, yeah, like the, the big problems are like, well, okay, how do we actually scale up? How do we understand, like what's going on with these agents by putting out into the world? Cause if you can't track it and yeah, disasters are going to happen. Let's talk about, what skills you need, to have in order to, like, start building with this stuff you can talk me through, like, basically like what roles are going to be involved and what skills that need not to build stuff.
-
Shane Murray
Yeah. This is something I've been thinking about.
-
Shane Murray
Quite a bit.
-
Shane Murray
And talking to, to data.
-
Shane Murray
Leaders about in the space, because.
-
Shane Murray
It's I think there are a lot of data roles in teams these days.
-
Shane Murray
You know, I can I can probably list off at least or different roles that you might have in a large data organization.
-
Shane Murray
But I think I think I obviously is.
-
Shane Murray
Kind of changing how we need to think about this. And, and the role.
-
Shane Murray
Of machine learning.
-
Shane Murray
Teams, the role of data engineering teams.
-
Shane Murray
I think when I if I break.
-
Shane Murray
It down into what I see is.
-
Shane Murray
Necessary.
-
Shane Murray
One is this idea of of context engineering.
-
Shane Murray
And I think at the moment I'm seeing the people that do that often have the title of.
-
Shane Murray
AI.
-
Shane Murray
Engineer, but in many cases it could be.
-
Shane Murray
A data engineer or a software.
-
Shane Murray
Engineer. But it's really.
-
Shane Murray
The idea that you have to.
-
Shane Murray
Engineer a good.
-
Shane Murray
Context that could be a drag pipeline, could be a set of prompts, could be hitting different APIs that that are bring in the right information. How do you make sure that you have trusted and reliable context serving this agent?
-
Shane Murray
The second I've heard some.
-
Shane Murray
Debate around whether this is kind of a product role or a more of a data science role, but how do you define and evaluate quality right. And and often this is a mix of human feedback, right. How do you take in human signals about, you know, what's relevant, what's irrelevant.
-
Shane Murray
And then how do you use Lime as a judge? Techniques to also.
-
Shane Murray
Measure and monitor quality.
-
Shane Murray
And I think I kind of think at the moment.
-
Shane Murray
This is a data science job and where I see it being done best, it's being done by data.
-
Shane Murray
Scientists because it it is a lot of thought about how you measure the quality.
-
Shane Murray
Of, of an experiment or how you measure these non-deterministic.
-
Shane Murray
Outputs in a way that's, that.
-
Shane Murray
Is scientific.
-
Shane Murray
Right. And then the third kind of role I see often is more as you start.
-
Shane Murray
To scale up, which is.
-
Shane Murray
Kind of the the.
-
Shane Murray
Platform engineer. Right. How do you instrument your stack to to be able to support many agents.
-
Shane Murray
To.
-
Shane Murray
Build upon?
-
Shane Murray
And we're we're a bit early for that. But I'm certainly.
-
Shane Murray
Seeing teams think about, you know, how do you have extensibility and scale. How do you have standard frameworks for experimentation and for observability?
-
Richie Cotton
It's interesting stuff. So I'm that second point about, how is data science teams that need you about monitoring the quality of, the AI work? I think, that's fascinating because I get the sense over the last few years there's been a a drift from data teams being the people responsible for doing AI to engineering teams being the ones responsible for doing AI.
-
Richie Cotton
So it's like, nice to have the data seems to still have a purpose here.
-
Shane Murray
Yeah. I'm I'm often saying that it's, it's the data teams that are.
-
Shane Murray
Potentially staffing that talent.
-
Shane Murray
Within the product or software engineering team. And so.
-
Shane Murray
It's a data scientist.
-
Shane Murray
That, that might be doing some of the context.
-
Shane Murray
Engineering, but might also be responsible for reporting back, like how well is this thing doing. Right. And I, I still think there's like a, a huge role for, for data people to play in this.
-
Richie Cotton
Oh, since you mentioned, context engineering, this is like, this is Andrej Karpathy making up phrases again. Do you just want to explain like, what is context engineering?
-
Shane Murray
Yeah, I think this term I, I've, I'm not not as.
-
Shane Murray
Sure of the.
-
Shane Murray
Derivation of it, but I.
-
Shane Murray
Feel like it is the.
-
Shane Murray
Right term after people have started with prompt engineering and then realized it's more than prompts.
-
Shane Murray
Right? In some cases you're you're building a pipeline.
-
Shane Murray
That basically all the ways.
-
Shane Murray
That you're bringing in unstructured and structured data.
-
Shane Murray
As well as prompting.
-
Shane Murray
The AI to to actually make it work. So that, I mean, it's not a very.
-
Shane Murray
Scientific definition, but that that's kind of how I think of this.
-
Shane Murray
Like larger space of of context engineering.
-
Richie Cotton
Yeah. I to say the term has been quite divisive at datacamp. So all the people who are involved in engineering, like building AI features like, yeah, trying to figure out how much context to give to an AI. It's incredibly important task context engineering is a real thing growing important. But then also like, the curriculum teams are people are building courses and you want to marketing like, well, giving context to people when you're explaining stuff that's just what we do all day.
-
Richie Cotton
We don't need a new term for it. So yeah. Definitely divisive. Depends on on your point of view as to whether it's a useful thing or not. Oh, so you also mentioned idea of platform engineering. Just talk me through what what's that like?
-
Shane Murray
My experience with with data teams over the.
-
Shane Murray
Past decade is, like.
-
Shane Murray
We really gravitated to this world where we need people.
-
Shane Murray
Focused on building the platform that that other data engineering or other data analyst or data science teams can make use of. Right? And so so building up.
-
Shane Murray
Both the infrastructure and the sort of golden pathways or the tooling that you expect people to use. And so, you know, as I've, I've been.
-
Shane Murray
Talking to probably data teams over the past six months about how they're approaching this. And very often they're seeing fragmented.
-
Shane Murray
You know, agents or AI being built out.
-
Shane Murray
On the edge. And software engineering teams.
-
Shane Murray
Who are close to the the problem statement of the customer. But they're all going and building with different tools and different frameworks. And so the idea of the platform.
-
Shane Murray
Engineering team is really about standardization. And often I think when it's when it's done best, you're.
-
Shane Murray
You're waiting till there's.
-
Shane Murray
Enough use cases to require a platform.
-
Shane Murray
You shouldn't build.
-
Shane Murray
Ahead of of the need too much.
-
Shane Murray
And then you're also like providing the.
-
Shane Murray
Things that are naturally adopted.
-
Shane Murray
You're not a good platform. You don't have to force.
-
Shane Murray
Adoption because.
-
Shane Murray
People see the benefits of.
-
Shane Murray
Not having to own that. That foundational layer.
-
Richie Cotton
Okay, this seems incredible, is like having some sort of standardized, infrastructure. And I guess a lot of companies, particularly the larger the company, you end up with silos, with different teams using different, pieces of tech that do the same thing. So I guess maybe talk me through, I'm excited to start this conversation. We've becoming AI ready.
-
Richie Cotton
So what's the infrastructure you need to be? I really like what does a sensible tech stack look like?
-
Shane Murray
I mean, at the moment it it feels fairly minimal, right? I think people are looking at at orchestration.
-
Shane Murray
So something like a line graph is, is very common. Right. And establishing like a standard for a gateway to whatever set of, of foundational models you want to make available to people within the organization to build upon.
-
Shane Murray
I, I do think like.
-
Shane Murray
Platforms then extend into.
-
Shane Murray
How.
-
Shane Murray
Do I reliably run experiments, right? How do I evaluate that this thing is better than the last version? And how do I how do I have observability.
-
Shane Murray
And that observability.
-
Shane Murray
Can be the latency.
-
Shane Murray
Or the cost.
-
Shane Murray
Right? How many tokens you're using?
-
Shane Murray
I think.
-
Shane Murray
Some people have certainly run into cases with, with all this AI being built, that somehow they'll have a cost spike where a team hasn't realized, like.
-
Shane Murray
What they've done. But it can also be like.
-
Shane Murray
The the traceability of these agents and ensuring you're knowing how they're behaving.
-
Shane Murray
So I think we're at at the early stages of the platform discussion, you can also I.
-
Shane Murray
Would say like, you know, there's discussion about whether you have a dedicated vector database or like what is the underlying management of unstructured and unstructured data that supports the agents.
-
Shane Murray
So all these things can be components. But yeah, I would say it it feels like we're pretty.
-
Shane Murray
Early in the platform discussion of of AI.
-
Richie Cotton
Okay. Yeah. So I guess probably on a plan for swapping out some tools, at regular intervals in the near future, then, is there a way to make your infrastructure a bit more fluid then, like, how do you plan for that?
-
Shane Murray
How do you mean?
-
Richie Cotton
If you got some sort of fixed tech stack, then, you probably that's not going to work. You're going to want to change it up at some point.
-
Shane Murray
I don't think I have a great answer for that, but I feel like, you know, investing minimally at the moment and supporting the top use cases. And then also most teams is still just allowing.
-
Shane Murray
You know, teams out on the edge to go and use what they want. But it's like, how do you make sure you've supported the top three use cases?
-
Shane Murray
This is how I've approached.
-
Shane Murray
Platforms in the past. You don't try to sell for %. You don't try to enforce too much, but you have.
-
Shane Murray
You know, you're.
-
Shane Murray
% of a value covered by by the platform. I think modularity.
-
Shane Murray
Is.
-
Shane Murray
Is probably, a key word for this now. And, and just making sure what you build is extensible to different use cases.
-
Richie Cotton
Okay. Yeah. I like the idea of modularity. Like maybe you don't want some monolithic stack that, you got to swap everything out all at once. Okay. So, you talked before about, observatory and like, how you test for quality and and you, I, I guess this all starts with, with the data. Right? So what sort of data quality controls you're going to want before you start saying, okay, we're going to use this data with our AI products.
-
Shane Murray
So features I think I readiness. It also has.
-
Shane Murray
This category of kind of I already data.
-
Shane Murray
And I think for, for some teams as I've mentioned, like it's about actually.
-
Shane Murray
Building the application and and seeing what needs to be ready and that process.
-
Shane Murray
I think also we have this huge.
-
Shane Murray
Branch.
-
Shane Murray
Of work, many teams now.
-
Shane Murray
Approaching conversational BI or whatever name you want to want to give to it, but basically replacing dashboards with natural language and and allowing that to access your data.
-
Shane Murray
I think for that, teams.
-
Shane Murray
Really need to and have been thinking about kind of data certification. Right. Which which data sets have.
-
Shane Murray
The right.
-
Shane Murray
Coverage in terms of monitoring, have the right incident response processes, have the right.
-
Shane Murray
Metadata around them to be ready, and that that.
-
Shane Murray
With AI tends to include synonyms like, you know, revenue equals sales and all the ways that.
-
Shane Murray
Someone using.
-
Shane Murray
The agent is going to, to talk about your data.
-
Shane Murray
And, and then, you know, essentially, I'd say that's kind of one side is, is have you.
-
Shane Murray
Prepared your data estate to have.
-
Shane Murray
Agents put on top of it? And then I've also seen data teams.
-
Shane Murray
Who are kind of tackling this idea, where they have downstream teams that are building agents, and they need to think about how they approach kind of readiness of structured and unstructured data. And the unstructured data is kind of a new component where.
-
Shane Murray
You know, you previously.
-
Shane Murray
Had very little visibility on that data. So I think teams are now grappling with the idea of, of how do I monitor the unstructured data that that may be used by, you know, tens of teams downstream?
-
Richie Cotton
I see that's a very interesting point. So how do you even measure like the quality of like, is this image any good or not?
-
Shane Murray
So I feel like it's early on, but there's some things that are.
-
Shane Murray
Parallel from structured data obviously. So is it fresh? You know, is it in a valid file format?
-
Shane Murray
There's some of the things.
-
Shane Murray
That are kind of fairly standard with.
-
Shane Murray
That. But then I think you've got an image is a.
-
Shane Murray
Maybe a harder one. But I think of text like semantic consistency.
-
Shane Murray
And I've seen a lot of teams try to.
-
Shane Murray
Protect for this by having like, one person contribute to the to the governance of, of the documents to ensure this semantic consistency.
-
Shane Murray
But if you've got really high volume variable.
-
Shane Murray
And high.
-
Shane Murray
Velocity data, then you need to be able to check that you know, someone's.
-
Shane Murray
Definition of risk is the same definition of risk that's understood across the corpus. Right. And so I.
-
Shane Murray
Think there's a need to.
-
Shane Murray
Sort of understand the semantic consistency.
-
Shane Murray
Of the corpus. And then there's also a need to.
-
Shane Murray
Understand the the relevance of it.
-
Shane Murray
And I've talked to a lot of.
-
Shane Murray
Teams that are dealing with kind of drag based pipelines, where they.
-
Shane Murray
Need to be able to know.
-
Shane Murray
When there's a drift in the relevance of the underlying corpus from the, from the model and the output that they've built.
-
Richie Cotton
Okay. So, just to make sure everyone said this idea correctly. So semantic consistency is this just like business definitions are consistent. So it's like okay this is how we define the sales qualified lead or this is how we define customer lifecycle value. And it's the same everywhere. And you want that across all your documents is that is a code.
-
Shane Murray
Exactly.
-
Richie Cotton
Yeah. Okay. So is there like a technological solution to ensuring semantic consistency or is it like a case of just read everything and I'd hope that is right.
-
Shane Murray
Yeah. I mean, in Monte Carlo we've done some work into into building.
-
Shane Murray
These unstructured data.
-
Shane Murray
Monitors and.
-
Shane Murray
So, so you know, I think we're we're sort of starting to work with teams to see like what they need to monitor and, and what they need to extract.
-
Shane Murray
But it's a different problem. And you basically need.
-
Shane Murray
To structure some information from that unstructured data in order to monitor for that consistency.
-
Shane Murray
So and and much of this.
-
Shane Murray
Relies on the fact that now we have like underlying line functions in our, in our warehouse that we can tap into.
-
Richie Cotton
Okay. Yeah. So it seems like maybe it's a sort of ongoing research problem is like how do you actually ensure like that document quality control like everywhere. I like to have you find maybe like a concrete example of how the, the data quality then feeds into AI quality. Like how does that relationship work?
-
Shane Murray
You know, I tend to think of.
-
Shane Murray
Of this kind of engineering.
-
Shane Murray
Maxim.
-
Shane Murray
Like as much reliability as.
-
Shane Murray
You need to know more.
-
Shane Murray
Right? Maybe this stems back to the idea of like, you kind of have to build the thing and put it in production and start learning and and.
-
Shane Murray
So these the, the idea of monitoring.
-
Shane Murray
These things is often one through monitoring and iteration.
-
Shane Murray
How does it affect performance? I think oftentimes people are.
-
Shane Murray
Really attaching to this idea of hallucination. Right. And it's it's kind of the most commonly talked about problem with generative AI.
-
Shane Murray
What I've found.
-
Shane Murray
From talking to to many data teams.
-
Shane Murray
Is that very often.
-
Shane Murray
Like the underlying problem that's causing hallucination.
-
Shane Murray
Is not the.
-
Shane Murray
Model itself, it's actually the context or the data inputs. Right. And so if you have outdated data right, then you're going to have a model that's giving potentially like old information.
-
Shane Murray
Or if you if you've just tapped.
-
Shane Murray
Into your underlying.
-
Shane Murray
Like.
-
Shane Murray
Confluence data, that's that's had maybe hundreds of contributors over the last ten years.
-
Shane Murray
You're going to have a mix of.
-
Shane Murray
High and low quality information.
-
Shane Murray
And so I think the practical well, the models can hallucinate.
-
Shane Murray
And it's certainly something you need to.
-
Shane Murray
To, you know, build monitors and guardrails on. The practical problem that many.
-
Shane Murray
Teams are finding is actually like, all right, how do I ensure the the data is.
-
Shane Murray
Reliable and.
-
Shane Murray
How do I ensure that the model outputs are reliable?
-
Richie Cotton
Oh, right. So we're back to a document quality control where like you've got a five year old. Well I mean maybe customer facing support pages that they ought to be kept up to date. But certainly like intranet pages where it's like okay. Yeah, I wrote this process seven years ago, a page still exists somewhere, and suddenly it's being pulled in.
-
Richie Cotton
And yeah, I, helped system.
-
Shane Murray
Well, I yeah, I mean, I think some of the early use.
-
Shane Murray
Cases have been can.
-
Shane Murray
We put a, can we put a.
-
Shane Murray
Conversational agent on top of our internal documentation?
-
Shane Murray
And I think people are very.
-
Shane Murray
Quickly fine finding that that isn't, a very up to date document source.
-
Shane Murray
I mean, obviously.
-
Shane Murray
There's, there's cases outside of that where you can tap into much more fresh and timely documents.
-
Richie Cotton
And are there any of the sort of data quality issues you see then, or is that is that the moment. Yeah.
-
Shane Murray
No, I, I think I mean, the data quality, the data quality issues that contribute, many of them.
-
Shane Murray
Are things that we've seen, you know, across.
-
Shane Murray
Traditional ML and.
-
Shane Murray
Across analytical.
-
Shane Murray
Use cases. Still, I.
-
Shane Murray
Find like silent schema changes from from an upstream, say, software engineering.
-
Shane Murray
Team.
-
Shane Murray
One of the most common causes of pain are even externally managed data that that you're ingesting that actually changes.
-
Shane Murray
And so schema changes is still this sort of, event that, if not controlled, can.
-
Shane Murray
Be like disastrous for, for downstream.
-
Shane Murray
Systems. I think then you.
-
Shane Murray
Have the idea of pipeline delays causing stale or incomplete data, because talking to a customer who said.
-
Shane Murray
Like, we need to know the delay between when the underlying document.
-
Shane Murray
Set is updated.
-
Shane Murray
And when our model actually starts to.
-
Shane Murray
Use that.
-
Shane Murray
Because this time in between, this.
-
Shane Murray
Latency.
-
Shane Murray
Can can cause it to be.
-
Shane Murray
Delivering the wrong answers to consumers.
-
Shane Murray
Right.
-
Shane Murray
And so there's a there's still a freshness problem that exists in AI, as with any data product that you're building.
-
Shane Murray
I think thirdly, you have this.
-
Shane Murray
Idea of data.
-
Shane Murray
Drift, which can.
-
Shane Murray
Occur from.
-
Shane Murray
Code changes or.
-
Shane Murray
Instrumentation changes, but the idea that your your underlying data that you're using in a pipeline or in a in context can actually drift into a place where the model no longer is giving relevant.
-
Shane Murray
Responses and then I think the new one is actually like measuring the the actual output.
-
Shane Murray
Of the AI. Right. And it could be measuring that for, for clarity of, a lot of teams I took to measure it. So like, does it carry the kind of brand image that we represent. Like is it speaking.
-
Shane Murray
As we would speak to customers, or is it grounded in the the context.
-
Shane Murray
That we've provided it, or is it accurate if you have some ground truth?
-
Shane Murray
And so all of these things, I think.
-
Shane Murray
From the underlying data.
-
Shane Murray
Inputs.
-
Shane Murray
Through to, you know, which.
-
Shane Murray
Can be schema changes and freshness through to the the actual model outputs.
-
Shane Murray
So the quality issues that are affecting AI in production.
-
Richie Cotton
So many different there things that can go wrong there. So yeah certainly with with freshness I can see like if you're pulling in like weather data or news data like even a small delay is like that's going to render the data completely useless. And certainly like changing formats like that. You mentioned the schema changes. Yeah. If your data is only in a different form, then that's going to break a lot of things downstream.
-
Richie Cotton
Okay. So I guess, a lot of things that can go wrong. Suppose like you achieve different phases, like, okay, we need to improve, data quality. Do you have like a single recommendation for like where do you start improving data quality?
-
Shane Murray
This is something we, we tackle at Monte Carlo a lot. And I'd say like the.
-
Shane Murray
Most logical place.
-
Shane Murray
To start and the place of highest impact.
-
Shane Murray
Typically at least.
-
Shane Murray
In large.
-
Shane Murray
Organizations and, and many.
-
Shane Murray
Small organizations, you'll have.
-
Shane Murray
The.
-
Shane Murray
The kind of.
-
Shane Murray
Foundational data products, right. The core.
-
Shane Murray
Data layer that.
-
Shane Murray
Should be easily join a bull.
-
Shane Murray
That everyone's tapping.
-
Shane Murray
Into.
-
Shane Murray
And they're building derived.
-
Shane Murray
Products on top of that.
-
Shane Murray
So you might have a domain teams that are.
-
Shane Murray
Actually tapping into that data. So, you know, I tend to.
-
Shane Murray
Recommend starting with those foundational data because it's it's where you might have hundreds of uses that you can affect downstream. And if you.
-
Shane Murray
If you improve the.
-
Shane Murray
Quality there, you improve it downstream.
-
Shane Murray
And so typically,
-
Shane Murray
I'd start with, you know, the base kind of.
-
Shane Murray
Schema,
-
Shane Murray
Freshness.
-
Shane Murray
Volume.
-
Shane Murray
Checks that you can scale up very easily. Right. And you don't need to be writing manual assertions. You can really just turn these type of checks on, with any observability tool.
-
Shane Murray
And then going going deeper.
-
Shane Murray
On.
-
Shane Murray
Your critical data elements.
-
Shane Murray
And ensuring you have kind.
-
Shane Murray
Of distribution monitoring.
-
Shane Murray
Around those.
-
Shane Murray
So that you know that by the.
-
Shane Murray
Time that it gets to a metric that the CEO is.
-
Shane Murray
Reading or.
-
Shane Murray
A piece of data that's that's really required, and then use.
-
Shane Murray
System that it's.
-
Shane Murray
Actually been measured along the way and hasn't had any transformation problems or any underlying data problems.
-
Shane Murray
So I think that's the the detection side. And then we typically encourage.
-
Shane Murray
Teams to really start with, you know, basic operational practices. If they don't have those, which might include.
-
Shane Murray
On call.
-
Shane Murray
Rotations.
-
Shane Murray
Having it clear and understood.
-
Shane Murray
Severity process for incident so that you can kind of separate the signal from the noise and really amplify the the signal of a seven or a Sev two.
-
Shane Murray
And then doing retros.
-
Shane Murray
When you have these incidents and, and kind of reporting on things like time to detect and time to respond.
-
Shane Murray
So these are kind of the, the basic practices that I think every, you know, software engineering.
-
Shane Murray
Team is implemented and every data team needs to have implemented these days.
-
Richie Cotton
So that's, simple to get data to. So I think that, yeah. So the idea of having people on call for when things go wrong, and doing very proactive, like a postmortem when there's a disaster so you can fix it. Very common in software engineering, much less so in data teams. So, yeah. That definitely seems like a good process to implement.
-
Richie Cotton
So, yeah, I guess if you're a data team leader, you're like, how do I do things about, just basically copy whatever the software engineering team is doing?
-
Shane Murray
I think that, I mean, we don't need to.
-
Shane Murray
Reinvent the wheel here. Like, I think data teams have been behind software engineering teams.
-
Shane Murray
In most.
-
Shane Murray
Organizations, just in terms of adopting these strong like reliability engineering practices.
-
Shane Murray
But like data just keeps.
-
Shane Murray
Becoming a more central part of the products we build. And it it's.
-
Shane Murray
So these.
-
Shane Murray
Are the steps to take.
-
Richie Cotton
Absolutely. Okay, so you mentioned earlier on about you go beyond just observing problems to try and find, like what the root cause of those problems was, just walk me through. How do you go about that? I think, like, once you got, like, an I agent and then, like, lots of layers of tech in between that and, oh, there's a data problem.
-
Richie Cotton
It can go quite deep, I guess. So, yeah. Talk me through. What's the process for finding a root cause of a problem?
-
Shane Murray
So maybe I'll, I'll.
-
Shane Murray
Separated into.
-
Shane Murray
Two.
-
Shane Murray
Pieces, which is kind of how we.
-
Shane Murray
Observe.
-
Shane Murray
And how we observe agents. And so.
-
Shane Murray
We've been doing work at.
-
Shane Murray
Monte Carlo to basically extend our agent observability so that we're capturing the traces of agent behavior. You know, through.
-
Shane Murray
All the.
-
Shane Murray
Building blocks, which you can think of is kind of let's take a rag pipeline. It would be, you know, chunking and embedding and retrieving. And then you've got the decisions that an agent might be making. And so.
-
Shane Murray
You know, as these agent architectures get more complexity built into them, we've just seen the need to.
-
Shane Murray
Extend that kind of telemetry instrumentation to be able.
-
Shane Murray
To to go and, you know, dig into individual responses.
-
Shane Murray
Right. Because you've got nondeterminism built into these, you need to actually be able to break it down and look at individual cases and understand essentially the lineage of those.
-
Shane Murray
And so part of it is that instrumentation. And then the other part is actually we've you know, over the years we've we've.
-
Shane Murray
Really invested in in troubleshoot and root cause analysis.
-
Shane Murray
And what we've built up.
-
Shane Murray
In Monte Carlo now is, is an agent for troubleshooting.
-
Shane Murray
And so what that agent does is it takes all of the context.
-
Shane Murray
Of an incident.
-
Shane Murray
And which, you know, includes the the anomaly itself, but.
-
Shane Murray
Also an understanding of the data lineage and understanding of the the different logs that are coming from.
-
Shane Murray
Contributing tools. And then it spawns a series of sub agents which might explore, you know, GitHub changes or airflow issues, or go upstream and look at all the.
-
Shane Murray
Potential data failures that could be happening.
-
Shane Murray
So each of these specialist agents then go and explore hypotheses. And you might have or
-
Shane Murray
Lims running in parallel.
-
Shane Murray
To explore a hypothesis, come back with a finding, share that with the.
-
Shane Murray
Main agent who then summarizes that within the span of about two minutes and suggests next steps.
-
Shane Murray
And so that's really the latest and greatest that we're doing on troubleshooting, which is like, how do you take what we've.
-
Shane Murray
Found with data? Teams can take.
-
Shane Murray
Kind of or
-
Shane Murray
Hours of for an incident to actually go and.
-
Shane Murray
Troubleshoot. And with any alert, give in two minutes.
-
Shane Murray
A review of what might have caused it.
-
Richie Cotton
A value that using, like hundreds of agents to then find out what the problem was with the original agents. So I've got like more problems after you debug debug those agents.
-
Shane Murray
It's, I will say, like, we've found it to be such a good use case because there's there's such a clear process.
-
Shane Murray
That engineers take to this hypothesis testing.
-
Shane Murray
And we've also, over the six years of Monte Carlo, built up an understanding of, you know, all the root causes that we can kind of give hints to.
-
Shane Murray
This agent about potential. But yeah, it's,
-
Shane Murray
Agents.
-
Shane Murray
Upon agents, as you say.
-
Richie Cotton
Not it does seem incredibly useful. I mean, I imagine, like spending hours going through, like, digging through your entire stack to figure out where the problem is. That's going to be an incredibly tedious and frustrating job for, for any sort of data scientist or, engineer. So, yeah, I like the idea of automating that. So you mentioned you've got sort of a six but six years worth of data on like things that could go wrong.
-
Richie Cotton
Tell me. Yeah. What are the most common sort of root causes of problems that you found?
-
Shane Murray
And maybe I'll.
-
Shane Murray
Focus on, like.
-
Shane Murray
As we've.
-
Shane Murray
As we've top two teams building AI. What some of the new problems there.
-
Shane Murray
Are.
-
Shane Murray
Obviously I kind of talked through like schemer and freshness and distribution problems.
-
Shane Murray
In data that can occur. But one of the interesting.
-
Shane Murray
Ones I've found is talking to teams. Building AI is the actual.
-
Shane Murray
You know, underlying model.
-
Shane Murray
Upgrades behind the scenes, right? Or prompt changes.
-
Shane Murray
And, and typically you'd think like these are occurring within teams that should know that changes.
-
Shane Murray
That are happening.
-
Shane Murray
But a lot of these changes are currently invisible. And I think part of it's because we're early. But, I changed from GPT to to right.
-
Shane Murray
Can really have dramatic impacts on an agent that you have in production. Right. And so.
-
Shane Murray
I I've heard teams.
-
Shane Murray
Say like they'll start getting feedback from the user base that somehow like the thing feels less.
-
Shane Murray
Useful. And we see.
-
Shane Murray
This using some of the that the UI is provided by, by the agents as well.
-
Shane Murray
But, you know, data teams need to.
-
Shane Murray
Be aware of those model changes or prompt changes in the same way that they are a schema change.
-
Richie Cotton
Yeah. All these, foundation model companies that keep coming up with new models and they're always like, oh, this is the latest, greatest thing you need to adopt this. But Ashley must be very, very careful in an enterprise setting when you're switching up your models, it seems.
-
Shane Murray
Yeah, I, I even if it's better, I think we've seen as data teams.
-
Shane Murray
Like.
-
Shane Murray
You care.
-
Shane Murray
About.
-
Shane Murray
Consistency in a way above like accuracy.
-
Shane Murray
Sometimes consistency is such a critical factor for data teams.
-
Shane Murray
And so a model changing versions.
-
Shane Murray
Behind the scenes or someone switching out a prompt.
-
Shane Murray
Has huge impacts.
-
Shane Murray
On the consistency of behavior for these.
-
Shane Murray
So that's that's been one I think another thing that has come.
-
Shane Murray
Up with a lot of data teams is, is this idea of embedding drift.
-
Shane Murray
Right, which you could frame.
-
Shane Murray
As kind of knowledge.
-
Shane Murray
Drift. But.
-
Shane Murray
You know, my embedding still.
-
Shane Murray
Relevant to the, to the use.
-
Shane Murray
Case that I'm.
-
Shane Murray
Supporting. And then I just say going back, like the thing that's.
-
Shane Murray
Being reinforced through my conversations is if that underlying knowledge base or document set that you're feeding in is low quality or is delayed, you know that that's going to to make or break your AI application.
-
Richie Cotton
So embedding drift this is a whole new problem to me. Toby, through what this is about like the the meanings of words changing. When does this happen?
-
Shane Murray
Yep. Potentially I'll maybe I'll give a sort of.
-
Shane Murray
Example that that a customer shared.
-
Shane Murray
But but basically they were doing some.
-
Shane Murray
Work in the US and had some embeddings that were specific to the US.
-
Shane Murray
And then starting to export.
-
Shane Murray
Expand the business into Canada. Right. And and the standards of, of the language of the images, in this case in Canada, were very different from the US. Right. And so.
-
Shane Murray
Their techniques, which might have been like few shot prompting or.
-
Shane Murray
Other techniques to actually make.
-
Shane Murray
This work, suddenly become less.
-
Shane Murray
Reliable for the for the scope of the problem they're solving.
-
Shane Murray
And so I think historically, like these sorts of.
-
Shane Murray
Solutions are being built by sometimes one data scientist. Right.
-
Shane Murray
But as you start.
-
Shane Murray
To fragment.
-
Shane Murray
Ownership of that solution.
-
Shane Murray
I think you you have more requirements to have observable or have monitoring on each piece of it.
-
Richie Cotton
Yeah. That's, absolutely fascinating. Certainly something I've fell over in my own life moving from for me to the US is like, oh yeah, a lot of these words are wrong for.
-
Shane Murray
Me to, moving from from Sydney to New.
-
Shane Murray
York, experienced exactly the same thing.
-
Richie Cotton
Absolutely. Yeah. So, it seems like, as you're sort of moving into new markets, you really, really need to be careful about like, AI performance and and is that can that change? Okay. So I guess in general, all this speaks to the problems of just, how do you go about changing your, your processes, like once you start adopting agents, do you have any, advice on, process management?
-
Richie Cotton
Like how do you go about, changing things?
-
Shane Murray
I think maybe the, the first one I'd just call out, maybe it's obvious, but like, human in the loop, I think is.
-
Shane Murray
Essential. Like, I.
-
Shane Murray
Actually probably wouldn't be building a solution.
-
Shane Murray
Without having factored in some human in the loop to.
-
Shane Murray
It. And I think both,
-
Shane Murray
As like a quality gate.
-
Shane Murray
But also to build that.
-
Shane Murray
Trust with users, you know, the ultimate consumers of it who who have a say in it being reliable.
-
Shane Murray
And I think when teams set.
-
Shane Murray
The expectations with their end users, they're in a much better place. And and also bringing in experts to ensure the quality of the.
-
Shane Murray
Application.
-
Shane Murray
Then that then they're, you know, actually, more successful in deploying these applications and, and driving that change management with their user base. I even saw, you know, the same thing with machine learning is that, like the closer you get to that end user and build for them and don't build to replace them, then then you're going to get a lot more buy in.
-
Shane Murray
I think the second one is like the the, the thing.
-
Shane Murray
I've seen teams do is kind of stop very narrowly. You know, prototype narrowly, get the buying, get the wins, then figure out the next scale. So they're going from to to a thousand users.
-
Shane Murray
In a way that I don't think we.
-
Shane Murray
Considered as much with.
-
Shane Murray
You know, we certainly.
-
Shane Murray
Did sort of a smaller AB test.
-
Shane Murray
But most products.
-
Shane Murray
You know, get launched to production without going through such rigorous kind of step ups in, in the user base. And I think sometimes that'll mean you, you go up and then you come back to your ten users and test some new changes. But it really feels like that sort of phased rollout is part of, the AI adoption process culturally.
-
Shane Murray
What what we've.
-
Shane Murray
Found that Monte Carlo because of course, like any other company where adopting AI is, you need to give people kind of space and time to experiment. You're not going to be productive on day one of using AI, and you're going to.
-
Shane Murray
Do a lot wrong. And so I've found that really useful.
-
Shane Murray
At Monte Carlo, we have a culture of like sharing the successes and failures we've had with using AI. And and that's actually converted a lot of people in into uses, who were previously maybe a bit shy to use, the technologies.
-
Richie Cotton
The idea of the phase rollout, that's actually surprisingly revolutionary or radical, I suppose, compared to, like SAS software, like, okay, we do an AB test, we give it, we give a new feature to % of our users. If it seems to work okay, we roll out to everyone. But yeah, doing it like ten uses that than a thousand and gradually building up that.
-
Richie Cotton
That does seem, it's a very different process and a lot more rigorous.
-
Shane Murray
And I think in part it ties to maybe, maybe the weakness of human in the loop is that right.
-
Shane Murray
Now and, and I've seen some surveys that have shown this to back up the anecdotes I have.
-
Shane Murray
But human in the loop is the.
-
Shane Murray
Primary way people are ensuring quality. And that's very hard to scale up.
-
Shane Murray
So you have to start thinking about how to, you know, keep that, keep that.
-
Shane Murray
At, at
-
Shane Murray
Levels that are manageable.
-
Shane Murray
As you scale from to to
-
Shane Murray
You can't keep.
-
Shane Murray
Scaling up your humans in the loop. So you have to scale up more automated evaluation and monitoring approaches.
-
Richie Cotton
Okay. Yeah, certainly. Yeah. Humans seem to be the bottleneck. Everything like, All right. So I guess all this has been building towards how do we get, uses of, of AI use of AI agents to trust the product. So, Jeremy, final advice on, like, how do you get that trust?
-
Shane Murray
Yeah, I, I had and.
-
Shane Murray
Data scientists say to me the other day that trust is gained in droplets and lost in buckets. I'm not sure who get who gets credit for that, but I thought it was a really good phrase.
-
Shane Murray
I think what we've found over, over the.
-
Shane Murray
Span of Monte.
-
Shane Murray
Carlo is that there's both what we can do to, to.
-
Shane Murray
You know, use.
-
Shane Murray
Observability.
-
Shane Murray
To, to make more reliable data.
-
Shane Murray
Products. But then there's a whole human.
-
Shane Murray
Element to this and to rolling out any software that ensures the trust. And and so I've seen three things really drive trust in data. And I think it extends to AI. But one is like you have clear accountability, right. You have clear accountability over the data. You probably want to avoid too much fragmentation across the value chain of the product.
-
Shane Murray
So you know, you know, who's responsible for the lifespan of this.
-
Shane Murray
Then you have an.
-
Shane Murray
Expectation of what quality and reliability means. So, you know, actually sharing that expectation, whether it's for nines of reliability.
-
Shane Murray
Or whether.
-
Shane Murray
It's something else.
-
Shane Murray
And and.
-
Shane Murray
As I've spoken to people that are launching AI products, they've found if they don't set that expectation upfront.
-
Shane Murray
There's missteps.
-
Shane Murray
With their users who maybe expect something else. So like, you.
-
Shane Murray
Need to know, you know, how.
-
Shane Murray
How trustworthy and high quality this product is that you're adopting.
-
Shane Murray
And then the third one actually is like, you know, we.
-
Shane Murray
See when when customers do go through data incidents, their ability to transparently communicate that to their end users, right, and communicate uptime and downtime is something that also builds trust, right.
-
Shane Murray
So, you know, people.
-
Shane Murray
Understand downtime of products, but if you don't communicate it well, it can it can be a failure. And in that trust building.
-
Shane Murray
So those are probably the three things. And you know I think.
-
Shane Murray
It's it just reinforces that that trust is kind of built in in production.
-
Richie Cotton
I really like the idea that, as in real life, a lot of trust is, is about communicate setting, realistic, performance, I guess. So yeah, if you don't have % reliability, then, you know, make it clear to users that maybe some things are going to go wrong some of the time. They shouldn't be looking to trust it, to be right on every single location.
-
Richie Cotton
All right. Super. And just that finally, I always want, more people to follow. So, whose research are you most interested in? At the moment?
-
Shane Murray
Yeah, I've. I've been, reading a lot of, And I'm just I'm just thinking.
-
Shane Murray
I'm not sure I know his full name, but.
-
Shane Murray
Hamel.
-
Shane Murray
Hamel and his website is hamel.dev. Dev.
-
Shane Murray
And he's an engineer who.
-
Shane Murray
Talks and writes a lot about building reliable AI. And so I've been following along with him. I think he he's writing some really interesting stuff around, you know, tackling error analysis of of agents and, and building evaluations.
-
Shane Murray
So I've, I've been enjoying him and then I'd say the other.
-
Shane Murray
The book I'm reading at the moment, is Empire of AI. Have you heard of that?
-
Richie Cotton
I have read the book is a very, very good book. Lots of gossip about what's been going on that over the last decade. Yeah. Is a good read.
-
Shane Murray
Yeah.
-
Shane Murray
So that's Karen Hao, who's a freelance journalist, but, yeah.
-
Shane Murray
I think she.
-
Shane Murray
Started by being behind the scenes in with, with OpenAI. And it it kind of builds from there. The good and the bad, I'd say.
-
Richie Cotton
Yeah. Yeah. Lots of, very, very juicy gossip in that book. All right. Super, thank you so much for your time, Shane. It's been a pleasure.
-
Shane Murray
Thanks so much. Richie.

