Skip to main content

A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndex

Richie and Jerry explore the readiness of AI agents for enterprise use, the challenges developers face building agents, document processing and data structuring, the evolving landscape of AI agent frameworks like LlamaIndex, and much more.
Jun 29, 2025

Jerry Liu's photo
Guest
Jerry Liu
LinkedIn

Jerry Liu is the CEO and Co-founder at LlamaIndex, the AI agents platform for automating document workflows. Previously, he led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora.


Richie Cotton's photo
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Key Quotes

At the moment, the best form factors for these AI agents, are very much assistive in nature and solve certain types of tasks and do it relatively well, instead of trying to bite off a ton of things end to end. Over time, you'll start to see a bit more trust in a multi-step setting.

Quantifying uncertainty means AI agents at least know the areas where they are unsure and require a little bit of human review. You absolutely need citations, particularly if you're trying to process a large number of documents and each document is super long. This gives the human the ability to verify things pretty quickly.

Key Takeaways

1

Leverage standardized protocols like MCP to facilitate seamless integration of AI agents with existing enterprise tools and systems. This will reduce development time and enhance the scalability of your AI solutions..

2

Implement mechanisms for uncertainty quantification and source citation in your AI systems. This will help in maintaining data integrity and provide a means for human review, especially in automation-based use cases.

3

AI agents are currently most effective in assistive roles, automating specific tasks rather than handling end-to-end processes. Focus on integrating them into workflows where they can provide immediate value, such as coding assistance or document workflow automation.

Links From The Show

LlamaIndex External Link

Transcript

 The best form factors for these AI agents are very much assistive in nature and solve like, you know, certain types of tasks and do it relatively well Instead of trying to bite off like a ton of things end to end. And I think over time as the models get better and as people warm up to these types of interaction patterns.

Then you'll start to see a bit more trust in like a multi-step setting. So quantifying uncertainty means the AI agent at least knows where it doesn't know and can kind of specifically detect spots where it basically requires a little bit of like human review. And then you absolutely need citations, particularly if you're trying to process like a large number of documents and each document's like super long.

Every piece of like extracted data should have some reference back to the source document. So this at least gives the human the ability to like verify that pretty quickly. 

Welcome to Data Framed. This is Richie. AI agents have so much promise for automating all the tasks you don't want to do. Automate what you hate.

As the saying goes, of course, as soon as you decide you want to build some AI agents at work, you're faced with all sorts of tricky questions. Like what sort of agents do you want to build? And. How are you going to make these agents actually work? These questions mean you need to understand a little bit about agent architecture and guardrails and evaluation.

So today we're gonna get into some of the practicalities of creating great AI ag... See more

ents. We're also gonna discuss some of the hot topics in generative ai, such as, do you actually need to be good at prompt engineering? And what exactly can you do with these deep research tools? Our guest is Jerry Liu, the CEO and co-founder at LAMA Index.

LAMA Index, the software is a generative AI development framework. It's very popular. The company of the same name is designed to provide a commercial platform to support this, and it's got a focus on creating generative AI agents for document processing. So great enterprise use case. Jerry founded LAMA Index coming from a background in technical ai.

Previously he led a machine learning monitoring team at Robust Intelligence, and he did self-driving AI research at. Uber and he worked on recommendation systems at Poorer. Let's hear how to create great AI agents. 

Hi Jerry, welcome to the show. 

Thanks Richie, thanks for having me. 

Wonderful. Now. AI agents, they're of course massively hyped at the moment.

I'm curious, are they really ready for the enterprise? 

I think you're starting to see agents, uh, start to automate certain types of business workflows. I do think, you know, a lot of the fully autonomous like AI engineers or you know, AI agents that can just do a complete drop end to end are still probably a little bit away.

Basically anything that will just do something end to end still, um. It hasn't like fully reached, uh, its potential. And I think this might just be due to some like new model developments coming up, but I think a lot of ai, um, agents are already making an impact in a variety of different settings. So I mean, a classic example is just a coding assistant.

I. Literally everybody is using coding assistance these days. You know, if you ask any engineer at a tech company, they're all using like cursor, windsurf, chat, bt, et cetera. And then of course, like, you know, you're seeing AI agents deliver real ROI in certain, uh, enterprise verticals. So this includes like customer service.

Uh, it includes like it help desk stuff. This also includes, you know, areas that we're actually very specifically excited about, which is. Document workflow automation, you know, whether you're a finance team or procurement, legal, being able to sort through massive volumes of unstructured data and getting insights from it.

I think one thing that people are realizing is, especially in the beginning, the best form factors for these AI agents are very much assistive in nature and solve like, you know, certain types of tasks and do it relatively well. Instead of trying to bite off like a ton of things end to end. I think over time as the models get better and as people warm up to these types of interaction patterns, then you'll start to see a bit more trust in like a multi-step setting.

Alright, lots of great use of cases there. I'd love to hear more about what the challenges are. Like where do you see the limits of agents in the enterprise? I. 

I mean, there's a lot of challenges if you think about it from a developer perspective. And I can speak to things that, you know, we're solving. And then of course, there's, uh, challenges that other players are solving too.

One is if you're a developer building these types of agents, uh, foreign enterprise, you have to figure out like some table stakes things. One is basically, you know, how, what, what is exactly like the agent architecture that you're trying to go for? In general, if you're trying to build and deploy different types of agents, there's very, like general reasoning, architectures, like chain of thought, react, um, you know, function, calling loops, code Act, et cetera, where you basically have an agent reasoning loop connected to a set of tools, and then the agent will dynamically figure out, uh, which tool to call to actually solve the task at hand.

Oftentimes this is a decent approach to get started and actually a lot of agents, uh, consumer grade agents, if you look at it like chatt, basically operate in this setting. I think the thing that people are skeptical about, especially for certain types of business use cases, are really about reliability.

I. The ability to actually solve like certain types of tasks and do it well. And for those, oftentimes you need a little bit more constrained architectures where basically certain parts of that flow are deterministic and you're constraining like where Owl Lens can actually do decision making, whether it's like a FL statement.

Whether it's doing some sort of reflection, validation, so on and so forth. So I think, uh, one of the design decisions that an engineer has to make when you build and deploy agents is basically how constrained is that architecture and what is the scope of tasks that you're looking to solve. And solving more narrow tasks is obviously easier than solving something bigger.

Another big thing is obviously eval. I mean, I think everybody kind of knows this. You know, just making sure that you're able to have proper mechanisms for benchmarking, evaluating, making sure the agents actually work well and. One piece that we're specifically excited about that? Um, I think a lot of, uh, like, you know, developers and also companies tend to not think about as much is really the data problem.

I think for a lot of these agents, they need access to massive amounts of unstructured. Enterprise data that they don't necessarily, they're not necessarily trained on at the beginning. And for them to actually access this data and interact with it in the right way, there has to be some sort of data layer to actually process connect, ingest this data.

Structure it in the right ways, and then also serve it up as tools to an AI agent. You know, a lot of use cases that we work with, whether it's, you know, invoice processing, you know, technical documentation, search, kind of like structuring your healthcare records, et cetera, basically deal with massive volumes of this data.

And being able to figure out, um, how to actually structure this is actually a top priority for these developer teams. 

So those were lots of great use cases for agents. I'm curious as to what the challenges are when you're adopting these, uh, in the enterprise. Uh, what are the limits here? 

Yeah, so there's a lot of challenges that your developer looking to build and deploy agents in an enterprise setting.

I'll just list a few off the top of my head. One is actually figuring out the right agent architecture. You know, there's very general agent architectures like react function, calling Kodak, et cetera, where you basically have a reasoning loop. Um, and then all you have to do is plug in a set of tools. Then the agent will dynamically figure out how to use these tools, solve the task.

A lot of consumer grade applications like chat GPT basically use this type of architecture. And the advantage is it's pretty simple to set up. The disadvantage is you don't actually get a lot of guardrails and sometimes you have reliability issues if you try to solve more constrained tasks. So the first is really figuring out, you know, how do I actually start small?

Solve, you know, a specific business problem within the enterprise first, before generalizing the agent. Second is making sure that you have the right eval mechanisms to make sure that agent is actually functioning, working correctly. You know, there's plenty of players in the space that have, uh, nice tools around evals and observability.

And then the third, which is oftentimes underrated that we find, and it also happens to be something that we are directly working on, is the data problem. A lot of proper. Enterprise grade agents, um, depend on large volumes of unstructured data, particularly with the use cases that we, we work with, with our enterprise customers.

And they basically need a data layer. Like you need something that will actually connect process, structure your unstructured data in the right format so that your AI agents can have access to it. So, you know, whether it's your structured data, whether it's unstructured documents like PDFs, PowerPoints, word docs, um, being able to.

Ing, uh, tag it with the right metadata and create the end endpoint. Whether you're doing like or whether you're performing different types over data takes a surprisingly large amount of efforts. It's actually something we're aiming to solve for the enterprise. 

Okay. So, uh, I have to say, I do like that idea of just start with solving a simple problem, then maybe get more complex rather than trying to solve everything at once.

And certainly lots of interesting points around like, making sure you've got the right data, making sure it's structured in the right way. Make sure you've got the architecture right. Lots of things to, uh, first to delve into one thing. So you mentioned the start, like one of the ideas was around, uh. Using, having assistant to help you write code.

This is sometimes called copilots. I know like there's the term AI as well as a AI agent. Do you wanna talk like what are the different like categories of agent. 

I basically see it as a spectrum from less age agentic to age agentic, just from a pure architecture standpoint. Like there's things that, uh, will be relatively general and can do a lot of things.

And then there are things that are basically computer programs, a little bit more deterministic that have some AI agents in the loop that will help you solve specific types of tasks. And in terms of like the specific ui ux for the user, I, I think I basically see two main categories. One, there's like an assistant based use case where you have a chat ui.

And, you know, it depends on a lot of human interaction to basically make sure you get back the result that you want. And then the second is something that's like slightly more automation based where, um, you know, maybe this process runs in the background. It's actually plugged in as part of some like log or running process.

It'll inspect our data, process this data, uh, you know, and then make decisions on it without the human in the loop. Except potentially in like a batch review type setting. Um, those are probably the two main use cases. Like the, the first category is basically gt, um, slash like any sort of embedded copilot.

Then the second category is something that's a little bit more like RPA flavor. 

Okay. So you can go from really simple, it's just like, it's basically chatbot that's using some tools through to something that's doing more thinking, working completely autonomously without that human sort of intervention.

Yeah, and, and to be clear, the, the assistant use case can also be relatively complex, um, as in it's not necessarily just like a chat bot. It can actually, you know, generate entire files for you, allow you to do like data analysis, visualization review. It's just like the, the nature of it is meant to be a little bit more like a copilot.

So it, it like is like an assistant for the human. And then the second category is basically more like a program that runs in the background. 

Alright, that's cool. So, um, I'm curious as to like what you think is the most, um, advanced agent you've seen that actually works. Like, uh, how far can you go with this at the moment?

It's a good question. I. I'm trying to think what, um, use cases I've seen that actually work super well. I mean, I think, uh, just in the general sense, um, agents are getting really, really good at coding. I mean, it's very clear that, especially with the reason capabilities that they have and the ability to just like generate large artifacts from scratch, you know, one being able to actually generate entire programs.

Um, this is where, you know, tragedy itself can one shot like a computer program. With stuff like, you know, cursor compose or, or like, you know, the, the composing the agent feature, lovable, et cetera. Like, you know, you basically get entire applications that can just build you full stack apps. We're starting to see that start to take shape and you know, even though I said Devon can't like really help you solve software engineering at least can build a relatively simple application for you end-to-end.

That's probably the most like, well-known type of like cool example. What we've seen in the enterprise setting is some really interesting stuff around kind of end-to-end process automation and also like financial analysis. You know, some of the customers that we work with have evolved beyond like a rag chat bot into things that can just do like automated financial modeling on top of your documents, and then the output is also becoming a lot more sophisticated.

You know, instead of just like giving back like chat responses, it can actually start generating entire PDFs, memos, reports, and also Excel spreadsheets. So really having agents deeply involved in that knowledge work creation aspect is something that we're starting to see with some, you know, companies a little bit more at the bleeding edge, uh, and then figuring out the right patterns to generalize for other customers.

That's very cool. Um, and certainly even like. It made it be super exciting, but it's generating a PDF from text is like, is not something a human necessarily wants to have to do. Having that automated just, uh, a nice little time saver, 

and you're starting to see this right now, right? With, um, I, you know, I've been using open AI deep research quite a bit.

Um, you know, from my own personal AI habits I abuse like clot tragedy and, and also like all the coding tools, but like deep research is pretty good for horizontal web-based research for like a general setting. Well, you'll probably see in like an enterprise setting is basically some version of that, but grounded within enterprise data and also specific workflows to basically create something that's a little bit more structured for a specific role.

And that's actually, you know, we've been doing a lot of work on creating the right architectures for this type of report generation setting. And making it work well within the enterprise. 

Can you talk me through more about how you go about using deep research? Like what are the main use cases? How do you go about using it?

Like what do you do with it? 

Yeah, I mean, I've been using it for just like general market understanding of the AI space. Um, I actually use it to get information. It just does web research on its own. You know, I go off, wait for like 10 minutes, come back, and then it comes back with analysis of like, you know, different types of companies in the landscape.

I've also used it to just like. This for me personally, I just use it to strateg strategically, learn about different functions, so like sales, marketing, you know, like product, et cetera. And then incorporate those principles with my own, my own contacts. And, and again, this is just me as like a consumer. I think the inspiration for me after using this type of stuff is like there is some layer of tools, uh, like, you know, tragedy, Claude, maybe the coding tools too that are relatively horizontal in nature.

You can use them to get a base layer of capability to just help solve your own tasks. And, and then, you know, the way that we see ourselves, like differentiating, especially in terms of building stuff a little bit more bespoke to the enterprise is really one, ensuring it's actually grounded within, you know, your enterprise knowledge base.

Um, and then two, that we're actually able to solve more domain specific workflow. 

Okay. Uh, yeah. Solving domain specific workflow seems to be like the, the big sort of enterprise use case. 'cause you've got all these sort of general purpose, uh, AI tools that work kind of well in a consumer sense, but once you get like, kind of deep into one area, they, uh, seems to where they fall over.

And, and the reason I think that matters is because, you know, technically using a general tool, if you really figure out how to, or you're like a power user, a tragedy, for instance, you can probably get it to do what you want. The main thing is there's like a significant amount of work and like the best people that use uh, best use of AI is like if you actually.

Inject the right set sets of context. Um, and you know, oftentimes for us it's really like the idea of building a product experience or the architecture and product experience on top of AI is to simplify that for different types of users. 

Okay. That's interesting. Yeah. So, uh, certainly prompt engineering is like one of the more popular things that people want to learn about ai, but I guess there are more people who each can't even be bothered or don't have the time to learn about problem engineering.

So products, experiences, to take that barrier away, I presume. 

I think so it's like, uh, prompt engineering or, I mean the prompt engineering, like the general sense is like how you really learn some skills and it takes time, um, to really use like a general AI capabilities. And I think a lot of, I mean you see this with vertical AI agents too.

A lot of it's really just like, how do I incorporate this with specific workflows and make that experience as easy as possible for different users. 

Okay. Actually, uh, while we're talking prompt engineering, do you have a top prompt engineering tip? How do you make your prompts better? 

I have no idea. I've been, um, it's actually a little bit primitive.

I've been concatenating everything in a giant text file and then just injecting about like a hundred thousand tokens worth of context into t um, and Claude and, and Gemini, he says. 

Okay. Yeah, so it sounds like, uh, if you've got a hundred thousand tokens for the context, let's, as a lot of context for, uh, for simple set of questions.

So I presume that's a, for the more advanced use case, but I like the idea of just providing more information and that's probably gonna get you better responses. 

Yeah. Yeah, exactly. I think a lot of it's a real triggering how to ask the right questions and then providing the right grounding. 

Alright, wonderful.

Now I'd love to talk a bit about the LAMA Index framework. That was kind of, uh, the open source software that's at the core of your business now. It seems like LAMA Index, uh, had some very modest goals to begin with, just around being able to use your own, uh, either personal or corporate data in an AI application.

It's grown a lot since then. So can you talk me through how has LAMA Index evolved? 

Yeah, for sure. And so. It's actually been super interesting because the mission of the company was very simple to begin with, and even as you know, it's been around two years since we started the company and obviously our product offerings have evolved a lot.

In the meantime, we're able to solve a lot more downstream use cases. It's kind of interesting how we're still staying consistent with that core, pretty simple mission of connecting L ones with your enterprise data. And you know, really the intuition at the time was more motivated by, uh, there were limited context spendo and tribu, like it was 4,000 tokens.

Like how do I figure out how to, uh, connect it to my entire enterprise knowledge base and really make use of this cool technology that had decent reasoning capabilities with and, and use that on top of like this massive volume of data. You know, that kind of led into everyone being interested in RAG in the early days of 2023 and two.

Everyone started becoming interested in, uh, like, you know, RAG was kind of like a pretty primitive concepts actually building different types of, more multi-step agent workflows that can interact with your external services. And so, uh, you know, starting from the beginning it was really meant to be a framework to make it easier for developers to, uh, connect LMS with their external sources of data.

And we built some initial patterns around rag. And as people started building things that were a little bit more sophisticated, we started building, um, some way deeper abstractions to make it easier to build more multi-step age systems. So in terms of how the project has changed over time, you know, I, I think a lot of people knew us as a RAG framework back in 2023.

Nowadays, you know, I think we've actually fully made the pivot to like a multi-agent framework. That appeals to both like beginner users as well as advanced users. 

Okay. Uh, that is interesting how the 2023 is kind of the dark ages of, uh, of these sort of, uh, concepts. So, uh, it's interesting how things have changed.

Um, so, uh, what would you say like the most advanced thing you can do with Lara Index is now? 

Um, you can do stuff that's like pretty advanced. I mean, you can build your own like deep research assistant. You can build your own like coding assistant too if you want. I will say, I think over time the framework has also gotten like a tad more opinionated and, and this goes hand in hand with our commercial offering because our commercial offering, which we'll talk about, uh, in a bit, is basically around like the, like the document processing and transformation.

And so if you think about how that compliments the framework, it's really helping to, um, enabled developers to solve like end-to-end document workflows. So a lot of like the most advanced use cases that we help users solve are basically these like multi-step workflows that solve like entire business processes within a company.

So, you know, whether it's back office automation, being able to read, transform unstructured invoices like, uh, legal contracts, et cetera. Being able to do like financial modeling on top of unstructured financial data. Be able to process and ingest heavy volumes of product documentation and like technical specifications, construction diagrams, and then being able to reason and make decisions on that.

And basically highlighting a few categories and use cases that we help enable for our customers. 

Okay. So, uh, these are obviously like, uh, very important business use cases. I'm curious as to when you'd want to, uh. Build something like that. So you mentioned the idea of something for processing or working with, uh, legal documents, uh, compared to, I know there are a lot of like legal AI companies around, like maybe hobbies, like the most famous one.

Um, so they have like complete solutions there. So when would you want to like buy in the whole solution versus build something yourself? 

It's a great question. What we're seeing with a lot of, uh, these large organizations that we work with, both open source users as well as some that are production customers with the commercial service Lava Cloud, is that they oftentimes have a little bit of both.

There are point solutions that solve specific needs within certain teams. Um, and you know, they're, some of them are pretty nicely designed. Uh, there's a decent number of like vertical AI agents out there. They're also building up, uh, internal developer capabilities. 'cause oftentimes what we see is a lot of these agents are a little bit more, either it's like a core part of what they need to build to actually, uh, do competitive differentiation.

Number two, it's, um, taps into kind of like cross domain data sources and is, uh, something that an out of the box solution can't really solve. Three, there's like common patterns where they want to build a multitude of use cases instead of just solve like a single point solution. So the reason developer teams exist, a lot of these companies in building with AI and kind of setting up the right architecture for building these AI agents is to really kind of stay ahead of the curve and, uh, enable them to like generalize these AI technologies to more and more use cases more rapidly.

Is basically the value accumulates the more use cases you're able to build and you can build that on top of the core architecture. So for us, like you know, our goal is to provide the core architecture plus tooling to help users have that core data layer that I talked about, and that's LAMA Cloud, our commercial service.

Plus the right agent orchestration to help the developers build agents in a reliable but also easy to set up with. 

Okay? So, um, if you, uh, wanted to like buy in, um, like your legal ai, then the marketing team also wants to buy the marketing, AI sales team wants to buy in the sales ai. But if you've got a course of a set of, um, components, then you can have similar agents that behave in sort of similar ways and sort of customize them for different departments.

Yeah. And you know, to be like, a lot of these companies will probably do a little bit of both. It's just, and, and also some of these use cases start bleeding between different teams too. Like, I mean, I think if you think about, um, some of the cases we're seeing. Like you might wanna look at your, uh, finance data, but then cross reference it with like legal or compliance regulations for these different types of checks.

You might want to kind of look at your product documentation, but then take actions in the downstream system. And actually, it's interesting for a lot of these, you actually do need developers to, to help build and stitch that together because waiting on a specific vendor to help build that for you.

Oftentimes, uh, you know, it doesn't work out 

okay. Alright. Um, that seems very sensible. It's like if you've got in-house development skills, then you're gonna have, uh, the capability to customize things as you want and reuse components and, and do things exactly as necessary. 

And then on like the architecture piece, there's also some interesting things where even across like different teams, um, a lot of the architectural, like the way you set it up seems quite similar.

For instance, there's like a class of use cases that basically. One, being able to parse, extract more documents. Two, you know, do some matching against a knowledge base of unstructured plus structured data. And a three, generate some report or memo. So I think a lot of things I mentioned, whether it's processing, insurance claims, invoice processing, contract review, compliance checking, and like a bunch of different domains basically fall into this.

And the nice thing is if you recognize those patterns, you can basically build it. Instead of it having to purchase like say five different solutions. 

That's actually kind of cool that a lot of stuff is just, well, yeah, pull in some information from existing reports, maybe pull some numbers out of a database, throw it into an AI mixer, and then yeah, you've got a new report.

Yeah. And of course there's like pros and cons to, to kind of like, you know, um, I guess building it versus a say like a vendor solution would probably be a little bit more tailored for that specific use case. You'd probably have like a few more toggles to work with. What I'm seeing though is just like, you know, if you're just.

Looking at a general assistant architecture where you can ship a bunch of things that basically, uh, can solve like a certain type of task, um, and do it in like a repeatable way. Then having those architecture allows you to ship like a bunch of these different use cases that are a little bit more simple, uh, maybe a little bit less sophisticated than like a point solution, but then do that specific thing really well.

Alright. Um, I'd love to get more into the document passing angle because it seems like, uh, well a lot of people have been saying, okay. Any kind of unstructured data is now data or reports or images or video, whatever. All this is data. Now, can you talk me through, uh, what kind of, uh, data you might want to extract from a report and what kind of value that brings?

For sure. Um, maybe just a really high level statement on, okay, so, you know, we, for those of you who knew us as like a L 11 framework for developers, you know, we built like a relatively general agent framework to help people build like, pretty complicated, sophisticated multi-agent systems. So the question is, okay, well then why do we also build, like, document processing and, and do that?

I, I think part of it was because. We looked at a bunch of use cases that were being built with lab index, the open source framework, and we saw that a lot of these use cases had to deal with figuring out what to do with their data. And specifically like PDFs and documents. In fact, one of the magical things of Chad GT and, and the potential of it is that you can just like dump in an entire document.

It can just like ingest, inhale it and figure out what to do with that information without you really having to do a lot of like manual data entry. But then, you know, as two actually try to do that in production. They realized a lot of things were actually failing in terms of actually understanding things from.

These types of documents and then making sure it's able to generalize to a variety of different document formats. Basically, I'll talk about both, like the components of these documents are complex to understand and then some specific types of like use cases for, you know, stuff that you might wanna read.

Um, one is, you know, a lot of these documents just in a general setting have a lot of tables and charts. Um. So, you know, having these like giant 2D tables, sometimes it's like nested. A lot of these tables are created because humans created it for other humans to read, and wasn't necessarily meant to be a very machine readable by a computer program.

Number two is if you look at any sort of PowerPoint presentation, there's a ton of these like charts and diagrams, pie charts, bar charts, buying graphs, and then sometimes they're just things that are like super complicated. If you, if you basically just used, uh, out of the box components to process that, it would basically render the information orally and then you feed it to an LM and then it wouldn't really understand what was going on, um, because you know, you didn't actually read the information from the page.

So I think just, this is mostly just to highlight like why the problem is relatively complex now in terms of, um, assuming you have like good technology for a document understanding, what we've seen a lot of people want to do from these documents is there's like kind of two main categories. One is just like represent it in like well formatted, like textual format so that you could use it for retrieval.

So if you want to index like a ton of documents, you just wanna make sure it's a parsed, processed in the right way, and that you could put it into a knowledge base for search later on. Number two is some sort of like automated structuring and extraction. So you know, for instance, for invoices getting like line items for a financial statement like a 10 K, getting like the balance sheet

concept here is. It basically is automating like unstructured data, ETL, right? So if you think about taking a document and then structuring it, this is a process that usually requires a ton of human input. And so instead you're basically doing this in like an automated fashion and eliminating a lot of data entry, and therefore you're kind of also saving a bunch of steps that you would typically require in like an ETL process.

So I mean, I think those are probably the two main. Use case patterns that we see with DOC and understanding, and both of those are necessary so that you basically have some sort of either unstructured text trunk representation or some sort of structure representation for AI agents. 

Yeah, certainly. Um, I think everyone's seen some horrendous PowerPoint presentation visuals and trying to turn that into some kind of structured data that you can then go work with to kind of meaningful, further analysis on.

That's gonna be a tricky sort of challenge. Do you need to change how you either write documents or manage documents? Then in order to make them more amenable, if you want to then go and use them in some kind of AI application or agent. 

Yeah, basically. So I think going back to the output representations, um, either you need to have some sort of like, I guess text chunk, like chunk based representation for retrieval, or you need to have some sort of like structured data representation.

And in a very general sense, basically you just need to provide the right tools to an agent so that the agent can use these tools. To understand the data and the goal of like this whole, you know, knowledge management document processing step is how do you transform your source data into some format that AI agents can access via tools.

So whether it is rag, whether it's some sort of like structuring the, the data from unstructured data, these are all part of those steps. 

Okay. So, uh, the editor is just like, basically, um, you need to put your documents just somewhere where they can be ingested into a vector database or whatever it is that the idea, um, or do you need to be doing anything beyond that?

Yeah, I think Vector database. So that's the rag piece. Um, and that's where, let's say you have a lot of data. You want to and basic and, and you want to do some sort of like retrieval from that data. So taking unstructured documents, chunking it, embedding it, putting into a backer database for search and retrie.

But there's also actually other patterns too. So let's say you like, uh, do some sort of structuring from the unstructured data. Like you, you get back some unstructured JSON. From the file based on some normalized schema. Then you could put it into like MongoDB or a SQL database and then that's like actually a different storage system and a vector database and you can run analytics on top of it.

And then there's also a kind of like other ways of basically treating these files as tools. I think one thing that's interesting is, and, and I think this is a jump from rag to agents, which is, you know, RAG really was like a very fixed process. All you could do on top of the data was retrieval. Like, you know, if you think about how agents might interact with files, they might want to, you know, load in dynamically, like a set of files at once.

They might want to actually analyze the functions in the file to search for specific pages they might like for an Excel spreadsheet, look at a specific sheet, look at the values of different types of cells, and the general sense is how do we. Process this data and create the right tools. So an agent can, uh, use these tools to interact with the data.

It could be retrieval, that would be the rag beast. It could be like text to sql. That would be like searching structured data. It could be just like some general file functions to interact with the files. 

Okay. So you want a lot more, uh, I guess find control over what the agent. Can do with data that's in a file rather than just, oh, I just wanna read whatever the the text is in now, or read the information.

You might wanna manipulate it or do do it. Exactly Right. Okay. And so, um, I guess the tricky part then in an enterprise context is around what happens if different documents say different things. So maybe you're like, well, I need to find whatever. Policy we have on something. Someone wrote a document like three years ago that's now out to date, someone sent a Slack message that gives you new information.

What do you do when you've got different data sources that conflict with each other? 

I mean, there's actually a interesting ask. Um, I, I think that specifically has been brought up as a use case. I think it's a little bit case by case dependent, but typically, you know, just in terms of how you would design architecture, you probably want some sort of way to do extraction from the documents, um, using some of our core parsing extraction capabilities.

And then you have some agent workflow that can reason over these files and compare the extracted text, including the timestamps and, and see whether certain things are, are out of date. I think the way you actually define this workflow is a function of. Um, but I think the power of our, you know, solution both from like document parsing and the framework is you can use our core like document processing modules to do that document understanding.

And then from the agent framework you can compose these workflows that can do like the, these cross reference tracks between documents comparisons, and then generate some sort of, uh, thing at the end. 

I like that there are kind of tools to help you out here. Um. Not sure whether it are tools, the sort of the complete solution to this, or do you need to change your processes or at least have some kind of governance around how you manage documents?

Yeah, so I think um, for us it's probably, you know, we're starting off with building the right tools for developers, so to speak. And so really like making sure they have the right APIs to work with. So it's actually customizable. I think you bring up a point though, in potential. Out the box solution that will specifically look at, you know, document comparisons, versioning, and seeing if things are out out of date.

Um, I think some of these things are on the roadmap in terms of more general knowledge management capabilities in terms of just like ingesting like an a bunch of documents. I think versioning is just one thing. Another one is basically just, you know, looking for specific like types of data quality, like figuring out like, you know, how you want to.

Extract out insights from these documents. That's, some of those things aren't things that what we have right now, but like, um, some of these things are actually on, on the roadmap. 

Okay. So, uh, for organizations who are organized, who are thinking about how, how they can do knowledge management better on the documents, um, do you have any advice on what they need to do to improve that setup?

I think 

the first, I mean, to be honest, I mean, from talking to a lot of companies, it's basically just figuring out the fact that like. You probably need to have some core modules around the data processing. Um, because I think a lot of, uh, companies still don't quite realize that, that to begin with and, and I think like a lot of companies that come inbound realize they actually need some core modules around parsing and extraction to basically structure this data and make it available to the AI agents.

So I think the building blocks honestly are, are the table stakes things to start with. And then I think the, the way you evolve beyond that is. Specifically when you think about the stack needed to build AI agents, um, this idea of like knowledge management in my mind at least is really how do you actually just build the right, um, MCP server, so to speak, for an agent to interact with your enterprise knowledge base?

And, you know, I think using some of our stuff, you can get back these endpoints that you can just give to an agent to interact with your documents with the right tools in. So I think when we think about this idea of knowledge management specifically tailored to how do. Create this like toolbox, so to speak for an AI agent versus kind of a more human based form of knowledge management, where you're able to click a bunch of buttons on a UI and look at your files.

Knowledge management for agents is sort of different from knowledge management for humans in that case? 

I think so. I mean, it starts getting into the whole product experience as well as like the API design. But yeah, I should have clarified this. Like, I think when we think about this knowledge management layer, it's specifically for how do you structure, process your documents for AI agent use, um, and, and not just like human use.

Because if you think about human use, it's basically designing the right ui, ux, having the right APIs to call. So that you can, um, you know, kind of go through your files, you know, like view them, get some aggregate metrics here. It's, you know, how do you provide all this information to an AI agent and give it the right tools to basically act upon this data.

Okay. Alright. Uh, now I think, uh, one of the big problems that, uh, people worry about with agents, uh, uh, over the last, uh, few months is, uh, around hallucinations and things going wrong. So, uh, how can you deal with, um, mistakes happening? How can you be sure that, um, you've got decent quality control in place?

Yeah, I think there's, that's a great question. And basically. Um, you know, I mentioned that, well, one, if you just don't have the right modules, like for for document parsing, you're just gonna get hallucinated data no matter how good your downstream LMS are. So let's say, you know, we, I like to think we have some better modules that help users actually get more accurate data.

It's not a hundred percent, uh, to be honest, it's, it's still like, uh, probably in the nineties, but there was like some long tail of complexity to basically make sure. You're able to, uh, like there, there's still some aspects where it's like not quite giving back like the, the exact results in those cases.

And I think to basically make this knowledge management layer work for AI agents, there has to be a way of quantifying uncertainty and having citations to the sources. So quantifying uncertainty means the AI agent at least knows where it doesn't know and can kind of specifically detect spots where it basically requires a little bit of like human review.

You absolutely need citations, particularly if you're trying to process like a large number of documents and each document's like super long. Every piece of like extracted data should have some reference back to the source documents. So this at least gives the human the avail ability to like verify that pretty quickly.

Uh, okay, so you've got, uh, metadata being captured basically throughout the process, so you've got. I guess an audit trail of like what the agent did in order that you can work out what happened when? I think so. 

I mean, some of, some of the stuff, uh, to be, um, you know, at the time of this recording, I guess like some of these things are actually, we're like actively realizing and, and it's things are being built, but we already have some stuff around like uncertainty, quantification for instance, be able to parse like a super complicated table.

There's like a score on how confident we citation. So that you can do some sort of layout mapping from the pars data to the source document. These types of things are useful because, you know, later on when you build an AI agent that automates this entire workflow, you look at this specific thing, um, this answer, and then you ask where it came from.

You wanna be able to trace back visually to the source document. 

Absolutely. Do you wanna expand a bit more on like, when you might need that audit trail? Because I guess, uh, transparency of like what agents do that's gonna become a, a big issue, um, as they become more widely adopted? 

I think so, and I think basically like, uh, going back to the agent form factors I mentioned, um, there's assistant based and there's automation.

For a system based like the humans already in the loop. And I think they have this behavior pattern kind of encoded in them anyways. Like, you know, when I do deep research, I wanna look at the citations to see where the number came from. Sometimes when I search on the Google like search bar, like the AI generated answer is wrong and I wanna verify like, you know, what's going on.

So I think like having citations for an assistant based use case is important because then they can actually, I, I feel like as people use ai, they already have this into habits. Um, and I think it. More challenging but also especially important if once you start actually trusting agents to do more end-to-end automation.

Because then you know you're gonna need to have some sort of ways to like look at things after the fact. Understand where the AI is, like hallucinated, being able to match review like a set of data, you know, specifically flag things for human review and then, you know, have a process to basically correct and update.

Ent, uh, or the way the agent has like processed this information. 

Okay. So you need that kind of feedback loop in order to be able to say, okay, it went wrong, and then next time we're gonna make sure that it does the right thing. 

Yeah. 'cause like, you know, in an automation based use case, if it's 99% accurate, but not a hundred percent, then 1% of the time it's gonna give you like issues, right?

So like basically you, you kind of need to have some sort of like human review, um, in that case. 

Okay. Uh, yeah, I, I can imagine that's gonna have some pretty big, uh, impacts on people's workflows then and like on what their jobs are. I guess if the agent's doing sort of 99% of the work and then you are sort of reviewing stuff.

Is there a good, um, is there a good new workflow or is just people are gonna spend a lot of time trying to debug agents? 

Yeah, actually, I mean, it is a good point. And, and you know, thinking back to your question on the knowledge management piece, like, especially in terms of like, you know, what does agent tech knowledge management entail?

Probably it needs to have some UX around human review validation, right? Because if you're gonna be like a data processing layer for AI agents, do you wanna make sure you have the right toolbox for AI agents to access? You also need to have some sort of proper UX to, um, identify, review data, update it, and have that interface baked in as part of the system.

And, and you brought it up because that's definitely part of like what we're building up as part of Lama Cloud does. Does that make sense? 

Yeah. Okay. Uh, so yeah, uh, you want to have, um, a nice experience for the humor viewers because otherwise it's just gonna be some sort of nightmare of like trying to delve into a agentic, uh, like I guess all the, all the stuff they, they spit out like, oh, I was thinking about this, and then blah, blah, blah, blah, blah.

And that's gonna be, uh, uh. Tricky and probably dreadful. 

Uh uh. Yeah, totally. I, I think this is like a, like human in the loop is in general important for agents, but especially in the part where you're like dealing with a ton of unstructured data and doing agentic reasoning over it, like, you absolutely need a way to do like human review validation.

Alright, super. 

Now, um, earlier you mentioned the idea of an MCP service, so, um, I'd love to learn a bit more about model context protocol. Like, so what is it and what can you do with it? 

I mean MCP and also, um, there's eight A from Google that was just announced, and I'm sure you know, in the next month or two there will be protocols from other companies that are getting announced.

So MCP is specifically a way for agents to interact with tools. This idea of agents that interact with tools to be very clear, is not net new. It's just like, you know, people were building it basically since 2023. It's just like nowadays, there's like a standardized protocol. The nice thing about a protocol, um, is that.

Now people can build against the protocol without having to build like a bespoke thing. And that like by building a server, it can plug into multiple clients instead of just plugging into a specific client. So what that means is like, I think why people got really into MCP is because like Cursor and Claude, basically both were MCP clients.

So if you built like a MCP server, it would just seamlessly plug into Claude and Cursor. So you could basically get immediate value from it. So that is cool. I mean, like, again, it's not like a new idea, but the fact that there's a standardized protocol means like server developers get way faster time to value, uh, from like being able to like see, you know, the stuff, how their stuff they're building is like really being impactful on existing AI tools.

To give you a quick framing of how we fit in, so we have, again, the end-to-end tools to help developers build these types of AI agents over their data. We have the document processing piece. We also have an agent framework on the document processing piece, it probably makes sense to see it as an MCP server.

So just like, you know, I, I think ba basically, this is what I mentioned with knowledge management for AI agents, think of us as like a knowledge management toolbox for your, for your AI agents. Your agent can call out to us to basically parse structure, get back information about your data, uh, feed that information to an out lab, to enable it to interact with the files.

As a framework. We help developers build agents. Um, so we, um, have actual adult integrations there too. You can build like a client agent that can interact with any MCP server as a tool. You can also build a server agent, um, that other people can, can call out to. I think one, uh, interesting thing about MCP and also, you know, eight away is MCP seems to be a little bit more geared towards agent calling out to tools Nowadays, like with aid away from Google, it's more about agents calling out to other agents and there's gonna be some interesting.

Maybe collisions but also differences between the two. And I'd be curious to see how it standardizes and I don't completely know yet. 

Alright, so, uh, the general idea is really just by, you want to make it easier and standardized for agents. Communicate with tools or agents to communicate with other agents.

And so if you wanna swap out different components, you're not gonna have the right a ton of custom code every time it's case of 

Exactly. 

Do the same thing each time. Alright? Uh, so that sounds like it's gonna make it easier to scale the number of agents that you can create. 

Yeah. Yeah, definitely. I mean, it's gonna make it easier to scale and then also easier, like way easier to integrate.

I think that's the advantage of a protocol. You can just directly integrate it with anything that implements MCP on the client side. 

Okay. So if you've got existing bits of um, I dunno, sales software or something, it is probably gonna support MCP and then you can just plug in your agent into that.

Exactly. I think this actually, what this does is it probably lowers the barrier, um, to building these types of like connectors. 'cause every company will just build their own MCP server and then having a protocol means. Like the connector, connector part is like much easier to solve. 'cause there's already a standardized interface for things to communicate with each other, at least in the hypothetical sense.

I don't know how practically, uh, like different MCP clients might have different expectations on what the server actually gives back. And, and then that might lead to a lot of complexity and API and design. So I, I don't, I don't really know yet, but like, yeah, like Salesforce, SAP, whatever, like every workplace app will probably have their own like server.

That agent can interact with. We'll have servers around, you know, document processing, knowledge management, um, and making sure that you're basically able to access all the huge volumes of data that you have available to you. And then this will just be easier to plug into any sort of AI agent front end.

Okay. Uh, this seems, uh, very useful stuff. Uh, so I'd like to talk a bit about, um. What you need to do in order to start building your own agents. So, uh, first of all, what do you think are the most important skills that, um, organizations need in-house if they want to be able to create their own agents? 

Yeah, I mean, I, I think there's two.

I mean, there's one just like general AI skills. I think everyone should learn. I think whether you're a developer or not, I. I mean, I would just, I, I use probably tragedy, GBT and client to generate like 50% plus my written content these days. I'm not ashamed of it. It's basically just like, I think how the world is moving.

And then if you're a developer, I mean, I'm obviously biased, but I would recommend checking out our documentation to basically, we have a lot of quick start tutorials, whether you're a beginner or you're a more advanced user, to basically either build an agent, a multi-agent system in like five lines of code or you know.

Use the core orchestration framework to basically build very deeply custom agents that can interact with any services. You know, we integrate with pretty much every LLM Vector store embedding model tools out there. Um, so there should be a decent amount of resources get started in terms of like just general skills of learning, um, what I would do is.

It kind of depends how deep you wanna go. If you're mostly just looking to solve the specific task, then I mean, I think the quick start stuff for you, using the higher level stuff is like fine, because you're more trying to figure out how to solve like a specific problem. If you're really trying to just deeply understand, um, uh, AI a little bit more, I would try to the build from scratch approach where like, try not to use the out of the box modules, um, and then see if you can just reason from first principles and like learn the techniques.

And I think if you want to be a proper AI engineer, I think that's, that's probably like a core skillset that you should learn. Um, I think there's a difference between folks that are a little bit more geared towards building these AI algorithms. And then folks are like adapting it for product. Um, and I think the, the former probably needs to understand the fundamentals a bit more.

Okay. Uh, so that's interesting. It's like, um, I guess, uh, yeah, the, the lower down the stack ego, the, the more sort of, uh, core or fundamental skills you're gonna get. But you mentioned the idea of a proper AI engineer. What, talk me through, what's the sort of, uh, what's the, what's the skillset for a proper AI engineer?

Yeah, I mean, I think a AI engineer needs to deeply understand the, like different types of architectures, the trade off between them, and then figuring out how to solve a problem from first principles, uh, while like understanding these trade offs. And I think, um, to me it's a negative signal if they like regurgitate abstractions that are in existing frameworks because that to me just means.

They like bought into the framework but don't really understand the core techniques and it's really about how do you like, understand the limitations of models but then, you know, figure out how to build the right algorithms around them to solve a problem and have in mind like the right set of trade offs.

Alright. Yeah. Um, I like the idea that, uh, yeah, you wanna be able to solve things from first principles, get to a solution yourself, rather than just, uh, relying on them 

to be clear, this is like core, like, I think, you know, there's probably a decent chunk of, uh. Developers are more interested in adapting this to build product, which I also think is super fair and also very important.

I, I just think like, there, there are two slightly distinct sets of skills. 

Ah, okay. So there's like software engineer using AI is different from, I'm building a, a course of AI product. 

Yeah. Like algorithm, right? Yeah. And, and sometimes, I mean, one person can do both. I mean, hey, like, uh, you know, uh, there's like.

Engineers that wear multiple hats. Um, it's just like, I think they're slightly distinct. 

Alright. Yeah. Uh, I, if, if you're doing both, then probably not of, not, not a lot of free time 'cause it's a lot to learn there. A lot of skills. Uh, alright, wonderful. Uh, so, uh, just to, uh, do you have any final advice on how organizations can, uh, make use of AI agents?

I think from a. I can speak to like the developer teams. 'cause you know, I think that's the part where we build some of the core products for. I would try to figure out like what are the pieces of the architecture that you really want to centralize so that you can actually start building not just like one agent but multiple agents on top.

And for us, a lot of the core components really. Are around like some of the foundational data layer, document processing, extraction, and then being able to have like, you know, the proper architectures that operate on top of this data. There are different companies at different stages of maturity. Some are just trying to get the initial use case out the door.

Some have like a dozen plus agents in production already. And what we've seen from these more advanced companies is that, you know, having the architecture in place really helps your developers go way faster in shipping these use cases and provides that compelling case for why you should have developer teams like the things.

Alright. Uh, nice. Uh, yeah, uh, you gotta have a. Case for why you want developers to do things rather than just having them work on, work on all sorts. Uh, excellent. So, uh, finally, uh, I always want recommendations for people to follow. Who, so whose work are you most excited about at the. 

I read a bunch of different data sources from like all different, um, types of people.

Um, I think I, from like a core model perspective and um, you know, I, I think of recent examples I follow like you like the React author, um, and he has some always interesting thoughts on just like the state of like model development, uh, where foundational models are going. Um, so that's just an example.

Alright, wonderful. Uh, okay. Thank you so much for your time, cherry. 

Thanks for your time.

Topics
Related

blog

AI Agent Frameworks: Building Smarter Systems with the Right Tools

Explore how AI agent frameworks enable autonomous workflows, from single-agent setups to complex multi-agent orchestration. Learn how they differ, when to use them, and how to get started with real-world tools.
Vikash Singh's photo

Vikash Singh

13 min

podcast

Getting Generative AI Into Production with Lin Qiao, CEO and Co-Founder of Fireworks AI

Richie and Lin explore gen-AI use cases, getting AI into products, foundational models, trade-offs between models sizes, use cases for smaller models, cost-effective AI deployment, excitement for the future of AI development and much more.
Richie Cotton's photo

Richie Cotton

44 min

podcast

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.
Richie Cotton's photo

Richie Cotton

55 min

podcast

Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKit

Richie and Russ explore the evolution of voice AI, the challenges of building voice apps, the rise of video AI, the implications of deep fakes, the future of AI in customer service and education, and much more.
Richie Cotton's photo

Richie Cotton

46 min

podcast

Designing AI Applications with Robb Wilson, Co-Founder & CEO at Onereach.ai

Richie and Robb explore chat interfaces in software, the advantages of chat interfaces, geospatial vs language memory, personality in chatbots, handling hallucinations and bad responses, agents vs chatbots, ethical considerations for AI and much more.
Richie Cotton's photo

Richie Cotton

45 min

code-along

Building a Deep Research AI Multi-Agent with LlamaIndex

Laurie Voss, VP of Developer Relations at LlamaIndex, guides you through building a deep research AI multi-agent using LlamaIndex.
Laurie Voss's photo

Laurie Voss

See MoreSee More