Creating High Quality AI Applications with Theresa Parker & Sudhi Balan, Rocket Software

Richie, Theresa, and Sudhi explore RAG, its applications in customer support and loan processing, the role of testing and guardrails in AI, cost management strategies, the potential of AI to transform customer experiences, and much more.

Jan 2, 2025

Guest

Theresa Parker

Host

Sudhi Balan

Sudhi Balan is the Chief Technology Officer for AI & Cloud. He leads the AI and data teams for data modernization, driving AI adoption of Rocket's structured and unstructured data products. He also shapes AI strategy for Rocket’s infrastructure and app portfolio. He has earned patents for safe and scalable applications of transformational technology. Previously, he led digital transformation and hybrid cloud strategy for Rocket’s unstructured data business and was Senior Director of Product Development at ASG.

Key Quotes

The future is here, but it's not distributed evenly. And that possibility, which is you take the RAG models of the world, you take the next level past that. You talk about LLM based agents, you talk about graph-based LLMs, there's a whole bunch of cleverness going around, but you know what? It's a very tiny sliver of the actual corporate universe that's actually adopted it.

When you combine the strengths of retrieval-based and generation-based models, retrieval-augmented generation enhances the quality of generated text. The real objective of it is to leverage the vast amounts of information that are available in large-scale databases, or for that matter, knowledge bases, which really makes it effective for tasks that require accurate and contextually relevant information.

Key Takeaways

To manage costs in AI applications, focus on minimizing the data sent to AI models by using RAG to filter and retrieve only the most relevant information, optimizing both performance and expenditure.

For organizations dealing with large volumes of incident reports, RAG can help prioritize which incidents require immediate attention, improving response times and resource allocation.

Testing AI applications requires a shift in approach, utilizing LLMs to generate and evaluate tests, while human oversight ensures the intent and quality of outputs align with business goals.

Links From The Show

Rocket Software

Course: Retrieval Augmented Generation (RAG) with LangChain

Transcript

Richie Cotton: Hi, Sudhi. Hi, Theresa. Welcome to the show.

Theresa Parker: Good afternoon.

Sudhi Balan: Good afternoon, Rich. Glad to be here.

Richie Cotton: Yeah, I'm excited for this. So, to begin with, can you just tell me a bit about what is retrieval augmented generation?

Theresa Parker: When you combine the strengths of retrieval based and generation based models, retrieval augmented generation enhances the quality of generated text. The real objective of it is to leverage the vast amounts of information that are available in large scale databases, or, for that matter, knowledge bases, which really makes it effective for tasks that require accurate and I'll use the word contextually irrelevant information.

Richie Cotton: And so I guess most commonly associated with uh, use for the generative AI applications. So, maybe we'll talk about those first. Sudeep, can you give us some examples of like what kinds of AI applications might want to make use of retrieval augmented generation?

Sudhi Balan: You know, that's the beauty of the technique. It's got such wide applicability, You can go from anything from customer support use cases, where instead of having your customer support people go look up thousands of pages of manuals, right? The AI is reading them for you and giving you either one of two things.

It's giving you the answer, or it's telling you very, very precisely where to look for the answer, Or you have, customer logs, You're looking ... See more

at incidents, and instead of trying to figure out from the incidents log, you can, again, have the AI do the reading for you. All this grunt work, which, humans don't really love, the AI will do for you and let the humans do the high value work.

So anywhere where you really want humans to be the ones who are contributing to the high value work, there's a lot of cut work to be done that you can get from off to an AI. That's a beautiful use case, right? Customer support, analog analysis, debugging applications, all of those things apply.

Richie Cotton: Yeah. So I think the support case is kind of fairly well known now that it's going to help you give like good support answers, but the idea of finding incidents dealing with like, waiting around, have you got logs of things that also seems like a very cool use case.

Theresa Parker: I have a great one that, comes from my family. My son's in banking. And we were chatting about his job and some things that he is, helping his organization to do. And so we're talking about the whole loan origination process. And I said, what does this really mean to you?

And so he was talking about how, you know, When he was doing this as loan officer, which is why he got out of it they got 10 new loans a week. Everybody had a quota to close, so they had on average 10 to 12 to close. If they closed more, they got bonus. And most folks closed 15 a month.

But, he was working himself, you know, into the ground. Because the average time for processing was 30 to 60 days. For so it was a lot of, to the point of grunt work you had your simple unsecured personal loan where you really didn't need a whole lot of collateral to business loans, so ran the gamut there.

For the simple ones, even the simple ones, you had eight different types of documents needed. Five to six pages a piece, different kinds of stuff that you had to look at, so proof of residency, bank statements, proof of income, all of these things that, you and I, when we apply for a loan, we have to get from our HR department or what have you, and we have to put all that together, and that for us is a pain, but at least we're familiar with our stuff.

Loan officers are looking at this and they're, you person, you as Teresa, you as Suti, you as Rich, are completely unknown to that person. So they have to go through things with a fine tooth comb. Business loans are more complex. So 30 different kinds of documents, 100 pages a piece. So 30 times, maybe 10, if you're going to do the math, that's a lot of information to go through.

So it's, tedious, it's mind numbing, but you have to pay attention because it's, You're on the hook for approving a loan. And so, everything is a little bit different, though it's the same, and there are different definitions, and it's just, it's a real grind. So anything that can help with a situation like that is of enormous benefit.

And so that's where the real value to this type of technology can come into. play because if you then can just have it look for the exceptions or have it flag something that seems slightly off or something that you need to investigate a bit further rather than deciding does Teresa live in the state of, Colorado or confusion is the case, maybe that adds enormous value to you as an individual and an organization as a whole.

Richie Cotton: Okay. So, I have to say the idea of like reading 30 different documents hundreds and hundreds of pages of stuff about like, who is this person? Should you give them a loan or not? And then doing that 15 times a month, that does sound like a lot of hard work. So I can certainly see how having AI assist you with that and just pull out the key bits of information.

That's going to be a real productivity boost.

Sudhi Balan: And it's you know, that's a great example. The other one that I've kind of loved and love hearing about is, people, these are people who work for three letter agencies, right? They get incident reports all the time, From all these, you know, hundreds of field locations. And there's some analysts sitting there and trying to figure out which of these 30 things that I got in the last hour should I look at, Think of the volume of data they have to process and think of the actual value added in a situation like this. And they can tell you, you know what? These funny things are not interesting, but this one is, You should probably start here. or something happened post incident.

And you're saying, you know what? Tell me what's anything suspicious that happened here with these parameters. \ It's vague, but it's actually looking for specific things, which you can't really articulate.

And the machine learning and the RAD techniques do a great job of figuring out intent, that is the big advantage. That's the big leap, I think, from, you know, classic keyword based search, where you could do the same thing, but you'd get too many hits, right? You'd get like hundreds and hundreds of hits, and it was useless.

This will give you the 10 things you should look at. That's what's great about it.

Richie Cotton: so it seems like, as well as like the sort of pure, well, maybe not pure, but these AI use cases, you've also really just got search use cases here as well. Do you want to tell me a bit about like how Retrieval Augmented Generation is used in this search context?

Sudhi Balan: Typically, right, what happens is, before AI then, before, you know, RAD came into the picture, you had a couple of kinds of search. You could build painstaking, you know, what we would call ontologies or indexes, And you say, this thing's an amplifier, or this thing's actually a capacitor, or this thing's like, something else, And you can build these amazing indexes, right? And then you can search and find me all the amplifiers and I'll go look at all the amplifiers and I'll find something, right? this is the customer who's actually working with us. And you can do that, or you can say, you know what, I I have a thousand manuals, just find me all the amplifiers, And both of those have interesting problems. One, is you get too many hits when you just search for amplifiers. The other one is, if you actually painstakingly put a bunch of, you know, indexes on content, that takes about half a second before it's out of date, because you get new content, or you decide, you know what, I will not classify them as amplifiers and capacitors, I want to classify them as stuff that can be sold in the U.

S., and stuff that can be sold outside the U. S., So all of these things change. And it's very expensive to maintain ontologies. what RAD does for you is it lets you actually post hoc decide, you know what, I just want to find stuff that I can sell in the U. S. And without having indexing, without having any kind of, pre processing, or knowing your intentions in RADs, right?

You can do these kinds of post hoc things. That's amazing, right? And that, that's some of the biggest value they're finding at customers that, you know, works. They can decide on the fly what they want to do with data that's been sitting around for decades, and they haven't done any thought in ever classifying it or making it useful for post hoc.

And

Richie Cotton: very cool because quite often you will come up with ideas on the fly and having to do planning in advance and, you mentioned like structuring things that's a lot of effort just, particularly if it's a sort of, if it's a simple question, you don't want to do much grunt work up front, you just want to get your answer quickly and move on.

So love to talk about how we actually put these ideas into production, actually making things within your business. So you've got them available. So I guess the big question is, like, where do you begin? What's the first step? Okay, I've got this AI use case or this rag use case.

How do I actually make an application that's going to be usable by either Okay. other employees or by customers.

Theresa Parker: So think that the thing that you need to do to start is, is to have, have a concept about what you're trying to get out of this. first and foremost is, it's great to just throw AI at stuff and, AI and, by association RAG is, great, but you as an organization, you as an individual have to put some logic into.

You can't just go out there and say, I want to do X because it's, I think, in some cases, the classic garbage in, garbage out type of, scenario, you're, you're gonna get that with everything. This is not a magic button, or a magic wand. It may seem like it if you do it correctly, and when you do it correctly, The results can truly be magical.

I mean, you can find out that Teresa is a really bad risk for a business loan because of, X, Y, or Z that happened when she was in grad school. And you really don't want to do this with her. And I was reading an interesting use case of a guy who goes through and like, he demolishes apartments that he lives in.

Now, this type of technology would be fabulous for finding out about him in particular. But if you've not If through your, your potential use cases and don't have that at least framework to go after it's not really going to benefit your organization. So before you pick a technology, before you pick any kind of model that you use or anything of that nature, got to figure out how is this going to benefit me as a business?

where am I going to get the biggest bang for my buck? Because It's not free. not free from a software perspective. It's not free from an infrastructure perspective, and it's certainly not free from a people perspective. It will make everything that you do on the back end better, provided that you've thoughtfully looked at those items.

Sudhi Balan: look, this, you captured it beautifully, right? Which is, you have to start with the end in mind, like in all things, right? and look, just because it's AI, all the risks of enterprise projects don't go away, In fact, now you have two problems, right? You have an AI problem and you have an enterprise project problem, right?

So in some sense, you kind of have to, solve both of those. And I'll add, what I'll add to, you know, Threesa's great exposition is, the thing that you have to understand, is especially when you do RAG based use cases or just generally AI use cases. The quality of the data, like you said, garbage in and garbage out, is important, but also, your data governance becomes super important now because you know what? The traditional model, and if you do the science project version of a RAG application, you say, take some data, put it into a chatbot, and voila, you can answer questions, Which is great. And now I want to put this to, you know, 100, 000 people in my company, and they all have different permissions on this content, right?

Not everybody can see everything. Not everybody should, you know, actually get the same answers because based on their level of access, based on where they live, based on their context, you have to give them different answers, right? You don't want to move that much data because just maybe sometimes it's 20 years of data.

And sometimes you want to only move like two years of that 20 years of data. So all these, enterprise project considerations come into play, And you have to start thinking, you know, methodically about, okay, what is it that I'm trying to achieve? And what are the tools I have to actually achieve that part of the problem before I even get to the AI part of the problem?

can I leave the data in place and go on it in place and still use it for AI? Can I actually maybe federate to that content without necessarily trying to pull it off into a single place, Can I leave my policies that I've built over 20 plus years in place without, and not actually disturb the apple cart?

So these are all important considerations and people, a lot of AI projects fail, a lot of enterprise projects fail for the same reason, which is, you know, You kind of have to worry about where your roadblocks are and kind of account for that before you jump into the AI piece of it, which is the cool piece, And everybody wants to do the cool piece and not necessarily the hard parts

Theresa Parker: That's the hard work, yeah. Well, and, to, pile on to what Sudhi said, you also want to future proof yourself as well, to a certain extent. Sudhi mentioned, you can open up the floodgates. to productivity and to information and everything else. And if you open up them too far, people are suddenly getting access to stuff that they shouldn't because of, you live or who you are within an organization or what have you. also want to make sure that you know, we're looking at the next phase of GDPR or whatever that, next thing is that's going to govern how we process financial information or how we find you know, who's doing what in the banking system or what new privacy regulations come to pass or, we're invaded by aliens and all of a sudden we need to lock certain things down.

I mean, that's outrageous, of course, but there, there's always going to be a next thing. And You want to do something that has the hope of growing with your organization so that you at least have the foundation for being prepared for the next thing that comes because there will always be a next thing that comes down the pipeline.

Richie Cotton: A few months ago Sudhir, you were saying, okay, one of the benefits of RAG is like, you don't need to plan quite as much. You don't need as much structure around your data, but then I feel like this bonus has been sort of swiftly revoked because actually you do need to plan your AI applications as well.

So, maybe drilling a bit more into the data side of things. What sort of data quality checks might you want? What sort of data processing might you you want to do in order to make sure that it's going to be high quality information going into your application?

Sudhi Balan: That's a great observation, right? Because you know what? it does for you is you don't have to plan up front before, you know, 20 years ago. But when you actually do the project, you do have to plan, that's the big difference. You can't maybe go pre process 20 years of data. Right. But when you decide, okay, I want to do an AI project, and I want to basically, you know, answer questions on my insurance policies, you have to decide, okay, how many years back am I going to go, How many people am I going to give access to this? Can they see everything about it? maybe they can see people's name, but maybe you don't want to give them access to protected information. what kind of retention policies I'm going to put on these things?

When will the data go away? Just all of those kinds of things. Another thing that typically happens, and this is very common in enterprise and mine, The data is across like 20 different platforms, right? It's not in one place. Because you know what? But historically, some of that has lived in, you know, mainframe environments.

Sometimes it lives in cloud environments. What are you going to do, Are you going to just try to bring them all in one single place? Or are you going to try to find some other way that, you know, maybe leaves them in a single, you know, where they are and you can kind of use them in place? Another consideration, some of this data is you say documents, and I say documents could mean many, many things, There could be very bad OCR scans that somebody took with a little, you know, camera and they're, you know, completely illegible, right? What are you going to do with those? So all of these things become important considerations because when people have a lot of data, and they've typically not paid a lot of attention to it, they all want to bring it in.

Those are thoughts you kind of spend some time on. That's the planning you have to do. You don't have to go back and reprocess 20 years of data. But you have to have an inventory of your data and say, Okay, what do I want to put in this thing? And then, will that give me the output that I'm looking for?

That's what you spend time on. And that, luckily, tends to be a much, much smaller investment of time than, you know, going back in time and reprocessing 20 years of data, right? Which is almost impossible for everybody except, you know, somebody who's got Google like capacity or something lying around.

Richie Cotton: No, that's fascinating. Your story about the OCR scans, I got chatting to someone in insurance and they were saying that they'd found this like, warehouse with all these sort of 19th century health records that they were trying to like scan in and make use of. And so, obviously, scanning it in, it's going to make it more accessible than on paper, dusty piece of paper in a warehouse.

But I can imagine that was bit of a technical challenge, turning that into something useful. So you mentioned the idea of making sure that the right people have access to this data. It sounds like you're going to have some sort of access controls there. How do you deal with data privacy in rag applications?

Theresa Parker: Really, what you want to be able to do is integrate RAG technology, whatever you decide to implement, with things that you already have in place. So, let's say that you have a retention schedule. And that's not necessarily data privacy, but it has, speaks to some of the things Sudhi's talked about where we talk about how much data is available.

So your retention policy and, you know, they're different based off of what country you're in and what industry you're in and what you as an organization have decided to do. Make sure that the RAG model that you're using, the AI technology that you implement Honors that and it's not something where you have to reinvent the wheel to be able to do so, it's an integration rather than a redo.

Same thing goes with things like, geotagged information. if I can see information here, but I can't see information that's from our division in Ireland. Well, need to make sure that that's not circumvented. Same thing goes with time based, implementations. Let's say that implemented something within your organization that says Teresa works from 9 days a week.

And that's the guardrails that she is permitted to work under. If she tries to log on on the weekend, and let's say that this is altruistic, she's just trying to get caught up with stuff, she didn't get everything she wanted to do done by Friday, she wants to do it on Saturday, but you've put guardrails around it that says, between 5 p.

m. Friday and 9 a. m. Monday, she can't get there from here, you don't want to circumvent that, and there are things that you redact, there are different levels of access people have based off of whether you're a line worker or your manager, you're the CIO, you're going to have those types of policies in place for your organization, and you need to make sure that the technology does not in any way circumvent that.

So one last comment before I throw this over to Sudhi is you know, we've talked about the implementation of this because it's, not free. It's not magic. You still have some of the constraints of an enterprise application. You have to test the thing. So whatever you end up doing, you need to run it through its paces to make sure that, use cases that we talked about a short time ago, are, what you think they are, or what ends up being when you're ready to flip the switch and turn this on.

Sudhi Balan: So, I will say this You obviously touched on the most important ones, which is, don't redo things, I think that's the, if I was going to leave people with one thing, which is, you know what, you have 20, 30 plus years of investment in enterprise security.

Don't redo that, Because that's kind of crazy that anybody would come and tell you to do that. You should pick technologies that will let you reuse the existing access control, the existing policy you have, and let you actually use that in the RAG application, Without having to touch any of it, Because if you're not doing that, you're essentially setting yourself up for a giant project. It's going to have interesting problems, So access control is one, reduction is one, all of the things we talked about. The other thing that I think that becomes almost equally important, is As you kind of go through this process, you'll discover two things.

You'll discover that people are not very willing to trust the results of that AI, because it feels like a human on the other end talking to them, And people have been naturally trained into assuming that humans actually know what they're actually talking about, right? Or at least some of us. And what happens is, today, right, everybody knows how to use Google.

But 20 years ago, right, I had to teach my grandma how to use Google, I think we need to start teaching people how to use LLMs, how to use And, you know, that's the other thing that you kind of need when you build a RAG application, is kind of coaching people along the way, So when you give them an answer, you want to give them a sense of the confidence you have in the answer.

When you have to give them, you know, saying this is the sources for this answer, Or you know what, it asks the question and it says, you know what, LLMs today, and you know, this is changing and, but and maybe the time you publish this, this might change. But LLMs don't do great at math today, right?

If you ask it a math question, you know, it'll kind of maybe try to give you an answer, but it may not be right. So you want to tell people using this and say, look, You're asking it a question which I'm going to try to answer for you, but I'm not perfectly, you know, completely accurate in this. So maybe check this answer, So that's a layer of access control, layer of security that you really want to put. on these RAG applications to kind of help people use them effectively. All of these become the guardrails you put around your AI application so that people can get value, Without making me lead astray, without, a name showing up in the paper because you put, told people to put, glue on pizza or something like that, right?

Richie Cotton: Absolutely. So, great points there. And the sense I'm getting is that basically you want to start with all your existing data governance rules, policies, just have them propagate into your AI application, and then you sort of get an extra layer of guardrails on on top of the output just to make sure that it's actually conforming to what you want it to.

Theresa Parker: Mm hmm.

Richie Cotton: Yeah. Oh, wonderful. I'm glad I understood that. So I want to circle back to what Teresa, you were saying is that sometimes you can have information redacted, you can have some information that's just not there. So I guess for generative AI, this causes a problem because. If things are missing, it's going to be prone to just making stuff up and confidently saying things where it doesn't have the information.

Do you have a sense of how you might deal with that? Like, what do you do if you're not providing data to the AI application because it's been disallowed by some policy?

Theresa Parker: So are we talking more like about you know, hallucinations, things of that nature?

Richie Cotton: Yeah. So I'm wondering what happens if you say, okay, we're not allowed to have to raises latitude and longitude, no geographic information and. normally expect it. So, what happens then? And what do you do about it when you've not provided it information?

Theresa Parker: there are a couple things here, and then I'm going to throw this over to Sudhi, because I, I think he's got a a good example for this. I mean, you're going to want to train Mm hmm. whatever you put together, you're gonna want to train it a certain amount information.

And it comes back to the testing there as well. Sometimes, the information is going to be there, but the fact that you don't have access to it is going to, you may have an impact on the answer that you're provided, and that, I think, circles back to the comment that Sudy made about confidence level of the data.

So maybe you have a marker that says, this, or maybe you have comment that comes back through the answer that says, you sorry, the thing that came to mind was Hal from 2001 A Space Odyssey. I can't tell you that, Dave. You know, that type of, thing in a professional business sense, of course.

But maybe you have some sort of, ranked level that you're able to share with the user that says, yeah, there is other information in the system you do not have access to, maybe go talk to your manager. I think that there are ways that you can inform the type of responses that you receive through, I'm thinking more of a, like a chat type of interface.

that's what I've got my brain around today. But I think that there are ways around that to be sure.

Sudhi Balan: The other thing I'll tell you, which I think is also in that same vein, right? LLMs are very sensitive to how you present things. So the same question asked different, slightly different ways, you know, with different punctuation or different, you know, when you redact a thing, for example, right? Whether you redact it by drawing a little bit, like, black box, or you do something else, like, you know, you completely elide that information, Actually has a difference. in how the answers are produced by the LLM. In fact, some of the things we've found in our internal work, right, is you can actually be very precise and very clever about how you align that information to LLMs and get much, much significantly better responses, but it doesn't try to make up stuff, So there are techniques like that we would apply, And I think, again, this becomes a question of really knowing your data and really knowing your application, And saying, you know what, this is the kind of data. What am I trying to elide from this? And, you know, you have to be clever about those things.

And that's where the planning comes in, right? That's exactly where, you know, you spend some time thinking about what you're trying to achieve.

Richie Cotton: Okay. Yeah. We're back to thinking about what you want to do is planning is hard work again. Um,

Theresa Parker: And use cases and testing and all of that stuff.

Richie Cotton: yeah. All right. So, since testing has been mentioned a few times, let's talk about that. So, one thing I've heard from people who are sort of building these things is that testing AI applications is just really difficult. I don't, whether it's just that. Developers hate testing in general because it's hard work, or whether, yeah, or whether they're just like not used to testing software in the same way that data scientists are, or whether it's like just generally you have to write a lot more tests than you would for any other applications.

So, do you want to talk me through how do you approach testing for AI applications?

Sudhi Balan: that's such a great point, right? And I think, well, certainly, you know, developers, you know, Are necessarily the biggest fan of tests. It's like, no, what did they say? It's like Castro and you know, it's good for them because they hate it. But you know, the point is, it's actually like you pointed out probabilistic applications, which are what these LLMs are, It's not a thing that developers are typically used to, encountering, But I think it comes down to, let's use the tools, So you actually can use LLMs in many ways, You can use LLMs to produce tests. You can use LLMs to judge the outputs of tests, Because thing that typically happens is, a software developer is used to a test running and giving exactly the same answer every time.

Which doesn't happen with LLMs, right? You ask a question, the answer might be slightly different, but it's still accurate. How do you judge something like that, So there are, you know, the state of the art is you get an LLM to judge the answer for you. And you say, you know what? Is this answer and the canonical answer correct?

Or are they actually close enough that I can actually do this? And these are the things that you actually have to start when you're, you bringing on a team of developers. They have to learn all these techniques, And that's where the state of the art is today, In the sense of, Just like they're good at, you know, LLMs are good at many things, right?

But they're really good at this, is, reading things and telling you things about the things they've just read. So you kind of use that ability, So it's like a whole, you know, it's like turtles all the way down, right? You basically have LLMs built on LLMs built on LLMs. And that's what you do, just use smaller LLMs instead of, you know, big ones.

So that's a set of techniques people are using. People are using, you know, certainly probabilistic techniques that, you know, just look at, traditional word, bags of words and, you know, just similarity matches. Yeah. All of these are, I think, evolving techniques, and I think we'll find that increasingly, right, things like guardrails, things like, you know, similarity matches, things like using NLMs to judge.

will become actually part of the toolkit for developers in their, you daily work.

Richie Cotton: I like the idea of using AI to check the work of other AI. know, it's it feels like less work there. Is there still scope for humans for testing AI applications? Are they still needed there?

Sudhi Balan: Of course, they are, right? And, and, and I think the nature of work will change, Humans may not write the test themselves. They might say, you know what, this is the kind of thing I want tested, And then the LLM actually generates a test for you. Or you decide, and this is where it becomes important because your understanding of the testing techniques become important, Is it a white box test? Is there a black box test, So all of these higher level of creativity, right, which only humans really understand the intent of what we're trying to do, Doing the work is somewhat easier now, but the intent of the work is only decipherable to, you know, you and me, right?

Because we are the ones who actually sit with the humans in the room and say, what are you trying to do, And how can I get you what you want?

Richie Cotton: It's always reassuring that humans are still needed. And yeah, obviously you want to be really, really sure that your output is correct. if it's going to be customer facing. So, yeah, having a human check, it seems like a good idea. So we've also mentioned guardrails a few times. So this is the idea that you can restrict the output and keep it on topic.

You just tell me a bit more about how they work and how you go about setting guardrails for your application.

Sudhi Balan: Yeah. this is such an important topic. And it's also, again, the state of the art is constantly moving, And again, it turns out, because it's a probabilistic problem, because again, the answer isn't precise. The kinds of things you're looking for aren't precise, You say, I want nobody to, you know, you don't want unprofessional speech, or you want, you're looking for maybe malintent in the questions, or you're looking for jailbreak the thing, which is a thing that It's a sport these days.

So, what do you do? you use tiny LLMs. So, insert, and this is back to my team, right? It's turtles all the way down. Because these things are so powerful, you can use an LLM to judge the output, the thing that comes back. And then you decide. And, you know, 10 years ago, this would be called, you know, classic NER techniques or NELP techniques.

So we're looking for keywords. Now the LLM can actually assess intent much more sophisticatedly. But the core idea is still the same, You figure out what's actually coming out in the intent. for, you know, classic things. You look, are people using unprofessional language? Or are the people trying to use this to, tell you how to, you know, get the CEO's salary somehow, Just allow it. Just, you know, not even actually, not even let the question go to the LLM because the earlier you can stop a thing, it's easier just to stop, right? So in many ways, you look at the question as it comes in, And you decide whether you want to allow it. You check the content that you send to the LLM and make sure it's actually appropriate to actually enforce some of these RAG applications.

And then you check the answers, right? So you can ring fence the whole thing. Inputs, outputs, and, you know, the content. All of those things can be ring fenced. Thanks.

Richie Cotton: Do you have a sense of how sophisticated these things can get? Like, is it just a case of you ban particular topics or you ban particular, like, tones of voice or content? Like, is it just like we can block toxic content or can you go more fine grained than that?

Sudhi Balan: People can and have built, you know, very fine grained stuff. Certainly the easy ones are the keyword, right? Don't mention my competitors. Don't mention, you know, like, if somebody asks, what's the best software for do a particular thing, don't give me any of my competitors products, right? So that's an easy, that's an easy one to do, right?

But but more sophisticated ones, certainly hate speech, Or unprofessional speech, And, you know, that thing's a little bit harder to detect, but you can tell, look, you know, Is the tone of this thing, is it, friendly, Is it friendly enough? And you can do that with prompts.

You can actually do that by looking at the answers. You can look for simple things like, what is the user's tone, Is the user feeling like, does the language he's using suggest that he's agitated? Or does the language he's using suggest that, you know, maybe he's somehow unhappy?

and you can use that. So these things have gotten pretty sophisticated. So, They're not super correct. I mean, they're not, you know, they'll still make mistakes, but, know, it's nice to live in the future. They're doing things that would be considered science fiction, you know, two years ago, right?

So I'm saying. They are much more sophisticated than, you know, we would have expected them to be like two years ago. the best part, and they've been proving all the time, which is the scary part, right? You know, they get GPT 3 to GPT 3. 5, and GPT 4 is like, jumps in capacity that we would not have expected.

Richie Cotton: Yeah. Certainly it seems very strange that two years ago, it seems like the distant past. It might as well be sort of medieval times uh, from an AI perspective. Okay. So I guess the other thing is around the cost of AI applications. So in general, generative AI is a bit more expensive to run than traditional deterministic software.

Can you talk me through how you might go about thinking about basically worrying about costs? How do you control costs in your AI application? Um,

Theresa Parker: Cost encompasses a whole bunch of things. I mean, you've got the cost of the software, and you've got the cost of all of the other things that we've spoken of in our, our comments today. You know, you've got your, your testing cost, and then you, have your cost to, spin it up, where do you run this, and, what that cost model happens to be and your developers, and, all of those things roll up into this great big thing called cost.

So you have those, and I know I've, wordsmithed this to a very high level here, and I'll suit you for specifics here in a moment. But you also have to look at the benefit of where this is. So you have the cost and you have the benefit, because they're two sides of the same coin. And think it's, really important as you're going through this, not just to look at the cost, but, you again, we're going back to some of the same things.

What's your use case? Why are you even looking at this to begin with? How is this going to benefit your organization? How is it going to let you evolve? Is it going to let you break into new markets? Because you're freeing up people who are currently doing grunt work to doing things that are much more strategic or much more efficiently.

And so, because, I as a loan officer, can process 45 applications a month instead of 10 to 15 maybe. what does that do for my business? And, and does that allow me to open up to new markets or to new individuals or to new, types of, activities within my organization as a whole?

what can I do with this technology? And then that comes back again to rationalizing the cost of that.

Sudhi Balan: and all the things you mentioned are absolutely important, cost is only important in the sense of, And what is the return, right? If it costs me a dollar, but it only makes me, you know, 20 cents, maybe it's not useful, But if it costs me a million dollars, but it saves me five, I don't care, right?

I think it's really that equation. And, the cost of generating AI applications come from three places, One is, It's the cost of basically getting all the data, that's some of the things we just talked about. Typically, if your data lives in 20 places, getting it to the AI is usually an expensive exercise.

And that's why we say, as far as possible, find applications or application solutions that will let the data live where it is. Because that's, in many, many ways, that's a huge cost you've eliminated. You've basically eliminated a two year project or, you know, changing your data pipelines to something that's, you know, something you don't have today.

The other chunk of that is people do have, large warehouses of data. Sometimes not all of it is effective and maybe not all of it actually needs to be in the AI, So again, look for solutions that will let you partition it and let you say, you know what? Of this piece of content, I only want the last six months every time, And then this other piece of content, I only want the last two years, I'm going to let you do that automatically so that you don't have to build these pipelines yourself, But keep only the most relevant and the most important slice of the data in the application, because this is the other part of it, When you build a RAG application, The biggest components of costs outside of people cost, which is really, you know, in many ways, the biggest cost, Is the cost of building the embedding models. and the cost of actually how much data you throw over the wall to the LLM for an answer, And the more sophisticated you are, building, putting the least amount of data that you can still get a good answer with, in the, both the embedding models and to the LLM, the better your cost profile is going to be, Today. Now, the cost of running these things at scale has come down like 100x, right? If you take the price about two and a half years ago to what it is today, you can run, you know, 4. 0 minis is trivially cheaper and gives you much, much better answers. But it still matters, right? How much data you throw over the wire matters.

Because if I decide I want to put like a terabyte of data in there, it's going to have zeros at the end of that number, even if it's, only millions of fractions of a set, so the other thing to kind of be clever about is, you know, maybe you only pick the three pages that make sense.

You only pick the five pages that make sense. So good engineering still matters. Good engineering practices, good data hygiene still matters, Those are the things that'll kind of help you run this at scale.

Richie Cotton: Okay. So reducing the amount of data that's being fed to the AI, that sounds about circling back to what we were talking about at the start, which is using retrieval augmented generation to pick out the correct bits of data.

Sudhi Balan: Yeah, exactly right.

Richie Cotton: But I'm wondering do you also want to start with making the AI limited in some sense?

So for example, you mentioned like if you want less data. So is it a case where if you build your first version of your support application, you're only answering like the top 10 most common user queries rather than trying to answer everything? Is that a good way of going about it, or is it easy just to scale to everything?

Sudhi Balan: So if you're only doing the top 10 queries, I wouldn't necessarily use retrieval augmented generation because I can can those answers, it's not a retrieval augmented problem anymore. It's basically, you know what? Find the top 10 queries in my database, Find the standard answers to them and just find those and, you know, match them, The reason you would use reticulog1 termination is you can actually solve the long tail, And because the long tail is, you know, if you don't have an 80 20 problem, and this is typical for people who have a lot of data, So you have a lot of, a lot of anything. And if the 80 20 doesn't really apply, you really want to solve that long tail.

And that's where you will actually apply it. But your, your observation is still correct, You still don't want to put everything in the AI, right? You don't want to say, you know what? I will solve every math problem that is for you. That one, you say, you know what, you intercept the question, you know, going back to the guardrail.

So this is a math question. And you say, you know what? This is a math question. So maybe, you know what? Let me just get a customer support agent on the line, You know what? And then that's the way you kind of solve that. But yes, absolutely. Problem partitioning is very important. And it absolutely comes back to our original case of, you know what, define the box, figure out what you're actually going to put in the box and what's outside the box because it's an enterprise project. It may have AI in it, but it's an enterprise project. Scope definition absolutely matters no matter how you slice it.

Richie Cotton: All right, yeah, plan it properly again.

Sudhi Balan: You're exactly right.

Richie Cotton: probably can't

Theresa Parker: you can't get around from that.

Richie Cotton: Okay, so from planning, the other side of this is getting to some sort of success. So, I'm not entirely sure how you measure the success of your AI application. Cause like, I guess quality of responses is one thing, but how do you know if your AI application is a success?

Sudhi Balan: That's a great question. And I think we're still determining the metrics for this because these things are, you know, you remember, they've been there for like 15 minutes, right? In the lifespan of these things, what do you define as success? So I'll tell you a couple of metrics people have used, One that we've seen customers use, and it appears somebody we've done work with, is how many of my routine queries can I redirect to the It's the, you know what, most of my routine queries are being answered by the AI. The customer, you know, doesn't ever actually, the email never comes to one of my customer service agents. That's a very typical metric, and you know, that's something we have a lot of experience with from other, mechanisms of, doing custom support.

But another one that's kind of interesting, and I thought interesting, was, you know what, how do people feel at the end of that interaction, Because you know how people feel when they actually deal with, some, one of these automatic chatbots, which say, pick one or pick two and pick three, and it feels very inhumane, it feels like in a cog in a machine.

People use these chatbots, they feel like they're being treated better, So that's an outcome that people are happy with, Another one, and this I think is something we've found a lot, You're looking for brand visibility. the first people to adopt these things in any industry get a huge huge leap in their visibility and their recognition, So sometimes you might do it just for that alone, That might count as success because the early adopters become the people who become actually become recognized and people who go to and say, you know what, and especially if you're in industries, you're selling the same products, there are a bunch of retailers selling the same products, but your customer service interface and your brand is much better.

That's value add right there, And that's where those things are harder to measure. But again, there's a tools actually people have for those things.

Richie Cotton: Okay, those first two seem sort of, fairly common sort of business success metrics. It's like, well, okay, how many hours of productivity have we saved here? How much did I delight my customers? How much, like, did they have a good user experience? But that hype one and bragging rights, I hadn't really considered before.

I was wondering, do either of you have any examples of companies where they've done something like this and they've got this recognition?

Theresa Parker: I have a personal experience that I can share with you. I won't say vendor or the brand. But I will say that Within the last month, I've had two experiences of organizations basically in the same industry and the difference between one and the other. And, I very much suspect that this company has gone to a RAG AI based technology just based off of what the interaction was like.

The experience was marvelous. It was so good, and it reflected not only the quality of the company, I was already impressed with the company to begin with, and I already knew that their personal customer support was exceptional. What I didn't expect was that what I expected to be a very impersonal, situation where I really expected a bad experience because this technology is still evolving and, it is kind of hit or miss. I fully expected to be disappointed I was okay with that because the state of the technology. It was marvelous. It was so good. And what that did was that further cemented my impression of this organization as high quality, innovative, really exceptional in everything that they do. And I have told people about that.

Conversely, I had an experience like within the next two days and it was just a massive level of frustration. And my circumstance with that organization is not yet resolved. So on the one hand, I was resolved within like 20 minutes, which was really fast. And this other one, it, several weeks later, I've, kind of given up at this point.

so I will tell people about both. And most of the time when you're talking about customer experience bragging rights. You brag about the wrong thing. You hear about the horror stories. You don't always hear about the good things. But I've told three or four people and some just kind of in passing about this fabulous experience that I had with a customer service type of, situation.

I mean, nobody talks about good customer service types of circumstances.

Richie Cotton: Yeah, maybe that's why the world seems slightly worse than it actually is, is people just love to complain about horror stories. But I do like that idea of just saying, well, yeah, I had really good customer service at this place and, you know, you can tell other people and hopefully that encourages people or organizations to do the right thing.

Sudhi Balan: There's this unique point, where the future is not evenly distributed, in that sense. So it doesn't, if you did the RAG thing, and your four other people did not, because remember, that's where the state is. Even though, you know, you and me might think RAG is old hat and we know all about it, If you go and look in the world, it's not there. Not everybody has this. So it doesn't take a lot, And you can be the one of the five people who are selling the same, you know, plumbing supplies, or, transistors, or, you know, washing machines, so it doesn't matter what it is, If you're the one who has the actual RAC based application on your website and your customer gets to ask the question, that's a differentiator.

It won't be a differentiator forever, Maybe two years down the line, everybody will have one of those. But today, here and now, the faster you can do that thing and give people access to that, It makes you a differentiator, makes you a leader in that space, And to answer your original question, we actually do have people in that exact scenario.

I won't name names, but you know, they're selling, parts which you can buy from everybody else, But their value add is their customer service. Their value add is, you know what? They're going to give you this exceptional height, what seems like a high touch experience. without actually costing that much.

And that's an advantage that fast movers can take off today.

Theresa Parker: Yeah, it's, the quality of what comes through really, so many things are commoditized these days because things are just moving so much more quickly than they did five years ago. Every incremental thing that you can do to improve the impression, the user experience. the quality impression that you have of a brand, that's huge in this day and age.

And maybe, not everybody goes and talks about this wonderful customer experience that they had. But, the thing is, the next time I have a circumstance that I need that particular type of service or product or whatever, I'm going to choose them in a heartbeat over anybody else.

Richie Cotton: Yeah, it does seem caring about your customers so time to resolving problems. Those thousand incremental changes just to improve the customer experience and provide something that feels a bit more luxurious than than it actually costs. Those all, all seemed like, great ways to improve things.

And yeah, AI is obviously a, a great way to do that. All right, so, just to wrap up, what are you most excited about in the world of AI? Bye.

Theresa Parker: I'm excited about the potential overall the potential of where this is going to, going to go and, we've talked repeatedly about how things are so much farther along than two years ago, two years ago, we weren't doing all of these things and we, or we weren't doing them to the same level of, precision and quality and whatnot. I don't know where we're going to be at the end of 2025. I can't believe it's going to be 2025, but where will we be in another two years? What will we be in four years? the potential out there is absolutely amazing and I'm delighted to be working in a company, in a position that lets me participate in that type innovation and that type of evolution.

So, have it from a professional perspective of looking at where everything is going to go and how then I personally am going to benefit both professionally and then just in my life. What's my next customer service experience going to be like in, you know, in a year or two years? That's going to be really interesting to find out.

Richie Cotton: I love that idea. No more terrible customer service experiences. That's, that's a dream there.

Theresa Parker: You know,

Sudhi Balan: That certainly

Theresa Parker: a company, there's a company that I have to deal with on a regular basis that if they did this, I would jump for joy. I would probably go out with a banner on the street because I hate dealing with the company. I hate it. I hate it. I hate it.

Richie Cotton: this is a point where I wish you could name names there, but yeah, definitely a lot of companies with really, really awful customer service. I'm not sure how they're still in business. Hopefully they'll get the message. AI can probably help them out with it, or at least cause them to rethink what they're doing.

All right. Super. Sudhi, what's your most exciting thing in AI?

Sudhi Balan: so I'm a technologist by heart. I'm a nerd by heart, build these things for a living, so I get to see like the underpinnings of it. I'll tell you two things that are really exciting, If you look at the progress we've made over the two years and the trajectory, It's unbelievable, if you, I don't know, if you don't have to believe OpenAI's vision, But certainly the The scale of the ambition of people, the things that people are trying to achieve delights me, The fact that we've gone to, you know, talking reasonably about, in serious fashion, right?

You know, science fiction was one thing, but today, like, there's people spending substantial amounts of effort talking about, they're going to build a super intelligence. And how do we control it, right? And these are serious conversations. Now, the U. S. government has policy about it. The government of California has policy about it, It's great to live in the future, mean, obviously, these are important problems. But, we're at the space and time where we know this is a real policy conversation because it's a real possibility. And that is exciting to me as a technologist, right? The fact that this is actually possible.

So that's super, super interesting to me, I think, The other thing I think that's kind of is the flip side of that is, One of my favorite science fiction authors says this, The future is here, but it's not distributed evenly, And, you that possibility, Which is, you know, you take the rag models of the world, you take the next level past that.

You know, you talk about LLM based agents, or you talk about, clever graph based, you know, LLM or whatever, right? There's a whole bunch of cleverness going around, But you know what? It's a very tiny sliver of the actual, you know, Of the corporate universe that's actually adopted it.

When we get these things in the hands of people, I love the CD ads they put out. Right. They know my 10-year-old can use Siri and you know, actually use LL Labs. That's the part I'm really excited about. What happens when we get these tools in the hands of people because the future, when it gets evenly distributed, it gives amazing results and I'm really excited about that piece of it.

Richie Cotton: Absolutely. So certainly being embedded in this space, talking about AI a lot, you sort of think, Oh, it's been around for years now. everyone knows about it. And then you speak to other people who aren't in the space and they're like, Oh, wait. What was that? I think I had something on the news about chat GPT and yeah, getting it into the hands of everyone including children.

Yeah, that's it's a wonderful future. So yeah, love that idea. All right. Brilliant. Thank you so much for your time, Teresia. Thank you so much for your time, Sudi. It's been a pleasure chatting to you both.

Sudhi Balan: Thank you. It was doing the pleasure myself.

Theresa Parker: This has been great. Thank you.

Sudhi Balan: Thank you.

Topics

Artificial Intelligence

AI for Business

Cloud

blog

Advanced RAG Techniques

Learn advanced RAG methods like dense retrieval, reranking, or multi-step reasoning to tackle issues like hallucination or ambiguity.

Stanislav Karzhev

12 min

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.

podcast

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.

podcast

High Performance Generative AI Applications with Ram Sriharsha, CTO at Pinecone

Richie and Ram explore common use-cases for vector databases, RAG in chatbots, static vs dynamic data, choosing language models, knowledge graphs, implementing vector databases, the future of LLMs and much more.

podcast

Data & AI at Tesco with Venkat Raghavan, Director of Analytics and Science at Tesco

Richie and Venkat explore Tesco’s use of data, understanding customer behavior through loyalty programs, operating a cohesive data intelligence platform that leverages multiple data sources, the challenges of data science at scale, the future of data and much more.

podcast

Scaling AI in the Enterprise with Abhas Ricky, Chief Strategy Officer at Cloudera

Richie and Abhas explore the evolving landscape of data security and governance, the importance of data as an asset, the challenges of data sprawl, and the significance of hybrid AI solutions, and much more.

See More See More