High Performance Generative AI Applications with Ram Sriharsha, CTO at Pinecone

Richie and Ram explore common use-cases for vector databases, RAG in chatbots, static vs dynamic data, choosing language models, knowledge graphs, implementing vector databases, the future of LLMs and much more.

Aug 12, 2024

Guest

Ram Sriharsha

Host

Richie Cotton

Key Quotes

Even from simply putting language models with vector DBs together in a very simple workflow, you can already get substantially far with AI. We've seen customers do that a lot. It's really important that you start getting into this generative space and start really leveraging it and build your ability to iterate and the ability to think about your data differently. Think about labeling. Think about actually measuring quality and so on. All of this is really only developed by doing. So the faster you can get to doing that, the better.

You can take GPT-3, GPT-3.5, you could take all the data that it was trained on, just put that data into a vector database, and use GPT-3.5 along with that vector database. And ultimately, you have better groundedness and better retrieval capability than GPT-4, which means that a weaker model with a powerful vector database, even just retrieving on data that the model was already trained on, can still do at least as good, if not better, than the state-of-the-art models.

Key Takeaways

Select a vector database that can handle both static and dynamic data efficiently, supports your scalability needs, and integrates seamlessly with your chosen LLM.

Achieve cost efficiency by using less powerful LLMs in combination with well-structured vector databases, leveraging the strengths of both to maintain high performance at a lower cost.

Invest time in collecting and preprocessing your data accurately, including parsing documents and handling metadata, to ensure high-quality embeddings and retrieval results.

Links From The Show

Pinecone

Webinar - Charting the Path: What the Future Holds for Generative AI

Course - Vector Databases for Embeddings with Pinecone

Transcript

Richie Cotton: Hi Ram. Thank you for joining me on the show. Hey, thanks a lot for having me. Yeah, great to have you. So, uh, just to begin with, uh, can you talk me through what are the most common use cases for vector databases?

Ram Sriharsha: So there is a few big use cases. Uh, one, uh, people might, uh, might not be familiar with is RAG, uh, Retrieval Augmented Generation.

So lots of generic, practically every generic application needs access to knowledge. This access to knowledge comes through interacting with vector databases as part of retrieval in a flow that's called RAG, Retrieval Augmented Generation. That's one large set of use cases. Another one is semantic search, and anytime you're searching and you want to retrieve content based on articles or documents or whatever, what have you, based on semantic relevance between query and document, you use vector databases.

Increasingly, we also see it used for labeling and classification use cases, where, you know, you have a, you know, something that you want to label, and you want to look at neighboring points in this vector space, and use the labels, dominant label of that neighboring set of documents or neighboring set of data.

what have you, to be able to label this thing. So these are the three big set of use cases, but there's broadly, uh, in some sense, uh, vectors of the lingua franca of machine learning, and we are discovering new use cases every day.

Richie Cotton: Okay, that's cool. So we got Retri... See more

eval Augmented Generation and we got Semantic Search and then there's also Classification stuff.

Um, so for those, um, I think maybe Retrieval Augmented Generation is perhaps the most well known. Can you just talk me through, um, how that fits into, uh, the idea of a chatbot?

Ram Sriharsha: Yes. So, uh, imagine that you're, uh, building a chatbot, uh, that say, uh, allows users to, uh, just query your, uh, knowledge base. Uh, a good example is, uh, Pinecone has a chatbot that, uh, allows people to just search over, uh, customer support tickets or search over, uh, resolutions to certain questions that may have, uh, need knowledge about Pinecone and things like that.

Now, if, if you're building a chatbot like this, you probably are using a GPT 4 and so on. Now, GPT 4 is great. It's a great reasoning engine. It has, uh, great, uh, general world knowledge. It also understands language, but it doesn't understand your data, right? It doesn't understand, uh, for example, the customer support decades, it doesn't understand, uh, your prior resolutions and so on.

What vector databases allow you to do is to put all this, uh, private knowledge base into a vector database. And then you, uh, if you have a CIRC query, you go to this vector database, retrieve candidate, uh, document chunks or, uh, parts of documents that are relevant, compose it into what's called a And you give this context to a large language model.

The large language model, uh, does reasoning over this context and extracts information out of it, and then hopefully answers your question. So that whole workflow is called RAG, and that's how it's used in chatbots, for example.

Richie Cotton: Okay. So this is just in a way of providing extra information that's specific to your company or whatever, just in order to increase the accuracy of the answers.

Ram Sriharsha: Yeah. More generally, it's a way of providing, uh, relevant, knowledge to the language model itself, because the language model doesn't really understand your knowledge.

Richie Cotton: Suppose, uh, your boss comes to you and says, okay, I need you to create a chatbot for our company. I want you to use, uh, some RAG. What's step one to creating a chatbot?

Ram Sriharsha: That's a great question. Step one is actually data. So, uh, first of all, presumably you have your data as maybe HTML pages, or it is maybe, uh, markdown documents, or what have you. So you collect your data together. You also have to decide whether this data is mostly static, which is there's a corpus that you just want to serve it, or is it dynamic, in which case you keep adding and deleting information and then you have to deal with that.

Uh, and, uh, hopefully your data is also at least partially labeled. In some, you have some questions and some answers that are relevant, some answers that are not relevant and so on. Obviously, if you don't have labeled data, then that's a even more challenging problem. You basically have data that's not labeled and somehow, uh, you need to understand, uh, quality of your pipelines.

So you start with data, you start with defining some metrics. Uh, do I want to give factually grounded answers? So you have to come up with certain metrics that define that. And then you, uh, decide on your language model. Now, again, most people probably, rightly so, will just choose the best language model out there.

Uh, which is what I would advise starting with, which is don't, don't start with optimizing for costs or optimizing for latencies and so on. Just choose the best model out there. Similarly, you want to choose a vector database that allows you to be flexible. For example, we talked a little bit about, is your data static or is your data not static?

Now, clearly some databases handle this better than others. You want to choose one that just handles whatever you throw at it. So you don't have to worry about that part. And then you start putting a pipeline together that takes your documents and converts them into vectors. Now, this pipeline can be as simple as just chunking your text into a certain set of passages, encoding each of these passages using an embedding model, throwing it into a vector database, and then retrieving it from there, and then putting it into your context.

So this gives you the most basic pipeline that you could build. And from there, you simply start tuning. I would probably tune for quality first, but Get the best quality that you can get, and then start figuring out, is it cost effective, is it not cost effective, and then what do you need to change in order to make it cost effective.

Richie Cotton: You mentioned the idea of the difference between static data and dynamic data, where the latter sort of updates more frequently. Can you just give me an example of like, when you might have a static data set or when you might have a dynamic data set?

Ram Sriharsha: Yeah, yeah. Static datasets are very common in webpages, for example, right?

Webpages don't change that often, uh, especially, uh, if you have, say, documentation about Pinecone that you want, and you want to search it. Obviously, documentation will change as we do new releases and so on. We might add new features and things like that, but it doesn't change that often. So you could imagine a workflow where you just take a set of documents, you just embed it.

Then you search over it. Uh, there is also use cases where, uh, things are very dynamic. For example, if you have a notion page and you're editing notion pages and so on, and you wanna search over it as actually customers of notion do at pen con, uh, using pen con. Now that is obviously far more dynamic. You, you're changing your pages quite often.

These, uh, create a lots of, lots of right. You also want to be able to be fresh, meaning you want to be able to query the things that you changed. Uh, so it's a very dynamic workflow. Similar dynamic workflows exist in sort of product recommendation use cases, where your product reviews might change, where your product definitions might themselves change.

Some products might be available, some, some might go out of stock. You need to be able to respond to that. You probably don't want to be searching over things that are not available.

Richie Cotton: Um, so we'll be getting into details more in a moment, but I'm curious as to how do you know whether your chatbot is successful?

What would you track? What sort of metrics are going to say this is a good chatbot?

Ram Sriharsha: This

Richie Cotton: is a great

Ram Sriharsha: question. By the way, this is, uh, this is not easy. Uh, in fact, uh, you, you, you have to really think about what you, what you're really looking for. So, for example, you, you want in some sense, uh, factually factual answers, right?

So you, you want to retrieve facts. So you wanna know whether the language model is actually illustrating or is it grounded in its, uh, uh, uh, retrieval is grounded in its answers. Uh, being grounded is not enough. We actually want it to be relevant, right? It could be giving actually correct answers, which is just simply not relevant to the question you're asking.

So in some sense, you're looking for groundedness as well as, uh, factual relevance, uh, search relevance and so on. So it's like two or three of these metrics that you can define. Now the challenge is, of course, in figuring out, uh, how to quantify them and how to track them, right? It's easier to track if you have, uh, feedback.

If, if you actually have, uh, a way of collecting feedback and to be able to use that to track whether things are, uh, Trending in the right direction or not. Sometimes this is easy. Sometimes this is hard. Sometimes you're probably building applications that other people are consuming. So the feedback channel is not direct and so on.

Um, so sometimes you can, you can also try to do the following, which is you could try to use more powerful language models, uh, to extract questions out of documents to say that you give the language model documents. It's going to extract questions. Clearly, those questions are actually factually related to these documents.

Now you want to put them into a flow and decide whether your overall pipeline is actually getting those correct answers or not. Now, sometimes this works, sometimes this doesn't work, because language models are, it turns out that they're not as good as human beings in asking questions of documents and so on.

So in some sense, they are too simple a question, but still, it's better than nothing. So, you know, in these things, I always think about it in terms of Start from something, even if it's not ideal and start kind of improving upon it. So the, the best thing you could do here is actually collect label data, actually ask people to, uh, uh, thumbs up or thumbs down for you, relevant or irrelevant, uh, examples and so on.

But if you can't do that, start with something simple.

Richie Cotton: Okay. So that's interesting that you've got a mix of AI techniques and also human techniques there. So you can get humans, uh, to give you a thumbs up or thumbs down and you get, um, AI to ask questions about documents and sort of create more data for you.

Okay. I'm actually curious as to whether, um, humans giving thumbs up or thumbs down works per se, like a support chatbot. Because if people are asking questions, they don't necessarily know what a good answer is, which is why they're asking in the first place. So is that a valid technique?

Ram Sriharsha: Yeah, so, uh, it can definitely help you compare between two things.

So it can help you compare between, say, two language models outputting the answers to the same question. So, so while a straight thumbs up, thumbs down may not actually work, uh, it can definitely, you know, you can use it to compare between two things. It really depends on how you present it to the user.

Richie Cotton: Okay. All right. So you've got a range of options and maybe none of them are perfect. So you try a few things and see what's going on. I'd like to go back to the start of this where, um, you're collecting data. So, um, how do you go about collecting all this data that's going to provide the knowledge for your chatbot?

Ram Sriharsha: So even those data collection pipelines can get fairly complex. So we talked about scraping the web. We, you know, for example, you might have to scrape the You might have PDFs that you want to search over, and then you have to figure out how do you go from PDF to something that the language models can actually understand, the retrieval engines can actually parse and store, your, uh, algorithms can actually chunk and embed and so on.

That's already a complex pipeline. So, a lot of the time, uh, I expect people spend in putting together these sort of chatbots and these sort of workflows is even in just the data collection, the data transformation, the, uh, The quality of parsing clearly affects everything that comes after it. So you could have the best embedding models, the best language models, the best vector interfaces, and, uh, still you won't get the high quality if your parsing is not good, right?

So, um, so I think that's, that's where I expect people spend a lot of time.

Richie Cotton: Okay, so, um, you mentioned this idea of parsing data and I'm sure this is rather than just scraping absolutely every bit of data possible and just dumping it into the vector database, you've got to do some sort of pre processing there.

Um, can you explain a bit more about what this pre processing involves?

Ram Sriharsha: Yes. So sometimes it involves, uh, you know, take, take, uh, documents that I'm, let's start with documents that are not complex, just text, right? So if you have text documents, you have to figure it out. First of all, how do you break them up?

Uh, it's called chunking. Now chunking can either be at the passage level. It could be chunking sentences, uh, each sentence and then putting it into a vector database and so on. But even that is not sometimes enough because, uh, often depending on the complexity of the document. There is, uh, some important knowledge that's in, say, the header or the, um, uh, metadata about this document that is needed for actually understanding the document.

Often, title of the document, maybe the first paragraph and so on, have a lot of information that subsequent paragraphs and subsequent sentences need for you to actually be able to do code retrieval. So, so just the art of going from a document to a set of vectors can get fairly complex. But that is the case even with like, uh, text documents.

Now think about HTML, think about images, think about tables. So you're, you're, uh, as your data starts getting richer and richer, the complexity of how to go from that to embedded vectors so that retrieval can do its job well becomes far

Richie Cotton: more complex. Okay. Yeah. I suppose thinking about like a product webpage, you've got pictures of the product, you've got tables of features and things like that.

Yes. Okay. Uh, do you want to talk me through like what that involves then and how you go about turning that into something useful? Yeah, that's actually

Ram Sriharsha: not, not easy. Uh, so you need, that's why I said you need to come up with really good parsing. Sometimes it is, uh, it's figuring out, uh, not just taking passages and embedding them.

but having passages be enriched with, say, metadata about the document itself. Uh, sometimes it is also, uh, it might even involve giving passages to language models to add context, even before embedding. Sometimes it is embedding, uh, the same document in multiple ways. For example, embed, uh, sentences separately, embed the document itself separately, and then combine the two.

Sometimes it is both sparse and dense embedding. And embed, uh, the keywords in the document using sparse embedding. Embed the rest of the document using, uh, dense, dense embeddings, and then be able to jointly retrieve it. So, uh, again, this depends really on the complexity of the document, the domain, for example, legal domains have keywords that matter far more than, you know, some other domains do.

And so, and things like this. So it's fairly complex. And in some sense, it's still an art

Richie Cotton: right

Ram Sriharsha: now.

Richie Cotton: Okay, uh, that's very interesting. It reminds me a bit of in machine learning where you're doing feature engineering and it's like you've got all this kind of raw data and then you want to add sort of layers of sort of structure on top of that.

Ram Sriharsha: It is exactly like that right now. In some sense, we are missing the, so what's, what was really impressive about the revolution with large language models and even prior to that, the deep learning. was, uh, a lot of feature engineering became automated and something that could be handled by algorithms. Uh, prior to that, the feature engineering was done by teams of people.

And there was a lot of bespoke feature engineering that we had to do. We don't do any of that today for language model training. Something similar has not happened for RAC and for the What we are discussing right now in, in really knowledge intensive tasks. So we're, we're kind of exploring and figuring out what that should look like.

So in the interim, we have all this art and a lot of complexity in how this is put together, uh,

Richie Cotton: exactly because of that. And is this something that's sort of coming soon, do you think? Um, some more sort of automation towards this, uh, processing?

Ram Sriharsha: I think there's a lot of work going on here to figure out how to automate this, how to really go beyond this sort of bespoke feature engineering to something that can be automated.

There's a lot of really good, interesting research, but nothing that I think is ready yet, so I think we're still a couple of years away. But there's a lot of hard work that Pinecone and everyone else included is doing in exactly these sort of areas.

Richie Cotton: One of the other tricky steps, like, um, beyond sort of pre processing data seems to be around dealing with updates to data.

So you mentioned the idea that, um, sometimes data is dynamic, it's going to have to be refreshed a lot. Um, so when your data changes, what your chatbot says has to change, how do you deal with, um, this process of updating?

Ram Sriharsha: If you think about what happens when you change data, sometimes you might have to re embed a document.

Sometimes you might have to re embed portions of a document. So again, this comes down to, for example, efficiency. So a lot of customers don't re embed the entire document because it's expensive. So they just re embed portions of it. And then you need to be able to know vectors do you delete from your database?

What vectors do you add? How do you make sure that all of this is done in a way that's consistent? So you actually delete something and add some something else, but you don't want there to be a partial state in between that gives you wrong answers, right? So, uh, these are, this is where the database y parts of vector database comes in, which is, this is what they are really good at handling is how do you do this sort of incremental addition, deletion, updates, and so on, and keep a consistent view of your data.

Uh, sometimes you're not even changing the embeddings, but you're just changing information about the embeddings, right? A good example is when your product goes out of stock and maybe temporarily, and you just want to say it's unavailable. And unavailability could just be a flag on a vector. I mean, people treat it as metadata.

You want to be able to transactionally say that, Oh, this metadata has not changed, or the price of a particular product has changed. So when searching for it, you want to now reflect that. Uh, so this is where I think the dynamicity and all this comes in is the ability of your database to handle all of this.

seamlessly and free

Richie Cotton: you up from worrying about that. Okay, that's interesting. So some of the things you mentioned there, like saying, is this in stock or not? What's the private product? This sounds like very much like standard issue database stuff. This is very much structured data. And I normally think of vector databases as being for unstructured data.

How do you deal with the mix of the two then?

Ram Sriharsha: So while it is structured data, sometimes it actually affects search. So a good example is just what we were discussing, which is imagine that I wanted to search a lot of products. But I'm only interested in products that are less than, say, 50. So immediately, the metadata about price is actually something that's important for the index and important for the vector database to know and understand so that it doesn't give the wrong answers, right?

So I would treat metadata as two different things. So the ones that actually impact search and search quality. versus the ones that are maybe decorating the answer. So the things that decorate the answer, you still want to treat it as a traditional database problem and you still want to have a traditional database that kind of handles that for you.

Often customers use key value stores, uh, other databases, traditional databases to do this. But then there's one set actually affects search quality, and that the vector database has to be able to handle. In fact, that is what makes vector vectored search challenging, is it has to handle these database y aspects pretty well

Richie Cotton: for it to actually function as a vector database well.

I'm curious as to whether things like joins exist, uh, in a vector database concept as well. Uh, no, no. So joins don't, and, uh,

Ram Sriharsha: and, uh, joins traditionally are even, I mean, they're always hard even for traditional databases, right? Thank you. For vector databases, I think you kind of do this sort of an implicit join, which is you put the metadata into the vector database, use it during search, and then you retrieve some candidates, and then you join outside of the database.

So that's like an implicit join. Uh, that's the best thing that we support today, and there are practically no, no vector database supports anything else, for a good reason, because joins across this, these two very different types of data are very,

Richie Cotton: very hard. Okay, all right, uh, that's good to know that uh, I don't have to worry about joints, it's all sort of taken care of implicitly.

Um, okay, so, uh, I'd like to talk about the, the LLM side of things as well. Now, um, because you mentioned that, uh, you probably want to pick like a decent, like one of the best sort of LLM available is when you start creating your chatbot, but then you might need Swap it out for something, um, cheaper to run.

Um, Can you think of when you're building an application, can you think of the LLM and the vector database separately? Or is there some kind of relationship between them that you need to worry about?

Ram Sriharsha: So first of all, depending on the task at hand, you might be able to use cheaper, maybe less powerful LLMs.

So that's already something that, uh, uh, one should be aware of. So you don't need GPT 4 for everything. You don't need the most powerful LLM for everything. Okay. It's really task dependent, but it, it, uh, in some sense, uh, the quality of your vector database and how scalable your vector database is also has something to say here.

So, one thing we have found, and we've written about it and so on, is you could take GPT 3, GPT 3. 5. You could take all the data that was trained on, just put that data into a vector database, and use GPT 3. 5 along with that vector database. And you have better, uh, groundedness and better ultimate, uh, retrieval capability than GPT 4.

Which means that a weaker model with a powerful vector database, even just retrieving on data that the model was already trained on, can still do at least as good, if not better, than the state of the art models. Uh, and you found this to be true, even for some open source models as well. So clearly this means that if you use vector databases together with LLMs, you actually can, uh, the sum of parts is actually bigger, right?

In some sense, you can get more bang for the buck. By doing that, um,

but even otherwise, it's important for you to know that, uh, you don't, you don't always need the best LLM for every task. Now, in terms of cost, today the cost is dominated by large language models. So, uh, language models really dominate this cost. In fact, the bigger language models provide you bigger context and that's actually expensive as well.

But sometimes bigger context actually helps. So even understanding exactly what some workflow should cost you is pretty tricky. So you, you might, you might go with a bigger context, with a bigger language model and end up with a cheaper workflow. Sometimes, sometimes you can go with a smaller model with lesser context, with more data and a vector database and end up with a cheaper workflow.

But overall, I think what I can say safely is that The more you use vector databases in the loop and more do you kind of leverage it, uh, the cheaper it's going to turn out to be. And in fact, there's a lot of research that shows that you could even take a lot of what the language models already know and put them into the vector database.

And that's going to lead to a overall more economical workflow. So I think economics is going to drive us in that

Richie Cotton: direction. Okay. That's very interesting that if you make more use vector databases, then you can go over the cheaper language model than your overall cost of greater in chatbot might be less.

Um, I'm curious as to when you need like the best, um, sort of LLM or when a, a lesser LLM or a a low power sort of a cheaper one will do. Do you have a sense of which tasks, uh, uh, you need for which.

Ram Sriharsha: Yeah, I think it's mainly, uh, uh, reasoning tasks and, uh, so, so the best LLMs are really good at reasoning and really good at picking out, uh, in some sense, paying attention to the, you know, even if you give some, an LLM a large context, it's not necessarily going to be good at picking out the things that matter from the things that don't matter.

Uh, this is, uh, this is usually the hard problem with giving LLMs the whole context. But we find that, uh, some of the, some of the, uh, powerful algorithms distinguish themselves in that ability, okay? And they distinguish themselves in the general reasoning ability. That's also because they've been trained on a lot of data.

Um, so, but again, this is evolving as we speak, as we speak, there are newer models coming out, particularly open source models that are increasingly challenging. language models, the proprietary language models and so on. So whatever they say here is going to be obsolete in

Richie Cotton: three

Ram Sriharsha: months.

Richie Cotton: Okay. Uh, yeah, actually just on that note, it just seemed like the best LLM just changes weekly.

Uh, and I guess you need to build your applications to make sure that it's possible to swap out the LLM in order to, you know, You know, optimize it. Is there anything you need to do in order to ensure that's possible?

Ram Sriharsha: Yeah, so I think the more you can treat your LLM as, uh, in some sense a black box that you can actually optimize away, the better, because That landscape is changing quite a bit.

Of course, you really, really need to understand prompt engineering because the prompt engineering is specific to LLMs and so on, and so you want to be careful about that. But if you can engineer your abstractions in such a way that you can swap out an LLM, that's, that's I think eventually a good thing to do.

That said, while we see that the leaderboard is changing quite often and like there is a lot of competition in some sense at the top for language models, one thing to keep in mind is that there is not a lot of. private data. There's not a lot of really good metrics and really good data that the LLMs themselves have not seen.

So in some sense, the best indicator of an LLM's performance for your application is actually your data and your queries and trying it out because I can tell you from, uh, a lot of benchmarks that we do and a lot of data that we have collected and so on. There is still a gap between open source modules and proprietary modules and so on, even though that gap is getting closed as we speak.

Because in this entire space, we are still missing really good metrics. We're still missing really good data sets that, in some sense, the models have no idea about. And the ability to generalize that sort of data and those sort of queries and so on. is what matters for

Richie Cotton: people at the end of the day. Okay, so, uh, we've talked about, um, two of the components of, um, JAPL, so we've got the large language models for the vector database.

So there's a third component in this, which is gaining a little bit of hype, which is knowledge graphs. And I'm wondering, where do they fit into this, uh, architecture alongside the LLM and the vector database?

Ram Sriharsha: So, uh, knowledge graphs are basically telling you something about the entities that, uh, uh, A reference of the document and the relationships between them and so on.

Generally, we think of this as, uh, information that you use in constructing embeddings, right? Or information that you use, uh, post retrieval. So usually they either belong in the injection pipelines that generate these embeddings in the first place, right? A good example is that some entity can be referenced by a name, uh, at the top of the document, for example, uh, but the rest of the document, it could be called by a proxy.

Thank you. So if we don't understand that the proxy actually refer to that particular name, you're not gonna be able to do properly, right? That's just one example. In that case, it, it means that the entity has to be somehow attached to every passage, or there has to be some way to resolve the entity to the what's being referenced in the passage and so on.

Likewise, during search to be able to rewrite a query in such a way that the query can now reference that entity, uh, explicitly. Uh, so generally I think knowledge reps are complimentary to vector databases and LLMs and. operate in the ingestion and search layer, where you want to reference that knowledge graph in constructing these embeddings.

You want to reference the knowledge graph in resolving queries.

Richie Cotton: Okay. Uh, so it's, it's very much a complementary technology to the vector database then it's about understanding relationships between different components of that. So I presume this is going to be most useful for if you want to do difficult reasoning about things, is that correct?

Ram Sriharsha: It's, it's often very useful for, uh, uh, domains where, where, uh, entities become important or like the ontologies become important, right? So that's something that the language models themselves would not. Usually no. An alternative to this, by the way, is to try to fine tune your language model to understand those entities and to, and to make those resolutions and so on.

Uh, but knowledge graphs are

Richie Cotton: also a good way to deal with this. So I'd like to talk about some of the other use cases beyond chatbots. Uh, so AI agents keep being hyped as like the next big thing, and I'm not quite sure whether it's happened yet, but probably coming soon. Um, how do, how a vector databases used within AI agents?

So,

Ram Sriharsha: uh, I think the, the best way to think about vector databases is in solving the knowledge problem for ai. Okay. That, that's really the best way to think about it. So if you, if you think about LLM today. They do kind of two things. They, actually, maybe three things. They do, uh, they're really good at reasoning.

They're really good at language understanding. Right, and obviously you need language understanding to do reasoning. But they're really good at that. And they're also capturing in their parameters, in some sense, world knowledge. What they don't have is access to your knowledge, or a way to actually use it.

Uh, the way I think about vector databases is vector databases are great at separating the knowledge part from this whole problem, which is, you really want your, the knowledge to be in a vector database. This could be world knowledge. This could also be private knowledge. To me, that's the logical step forward.

What does that leave LLMs with? That leaves LLMs with a really good understanding of language and a really good reasoning ability. So when people talk about AI agents and so on, they're optimizing for the reasoning ability, that ability to reason and the ability to use tools and things like that. What vector databases really do to fit into this whole equation is they solve the knowledge problem.

Like, where does that knowledge needed to act upon something come from? That's the vector database. Likewise, where's the, uh, how do you, uh, manipulate that knowledge? How do you, uh, redact that knowledge? How do you make sure that these language models are not hallucinating and so on? I think the code to solving that problem is vector databases.

What vector databases don't solve or don't really affect, Is the reasoning part, which is the

Richie Cotton: actual agent. Okay. So, uh, in this case it's gonna be a very similar use case. The only difference is this time the LLM is calling some external tools. Exactly, exactly, exactly. Usual things. Okay. Alright. So the other big thing that's being had for is the idea of multimodal ai.

So things like, um, using audio and images and video. Um. And of course, you can store all these things in a vector database. I'm curious as to how exactly this works. Can you talk me through it?

Ram Sriharsha: Yeah. So I think, again, there is multiple ways to deal with this. So you would think about storing images embedded into a vector database.

You can store text into a vector database already. During search, you want to retrieve both, right? So you want to retrieve, given a piece of text, you might want to retrieve images relevant to it. Given an image, you want to retrieve some text that's relevant to it and so on. So at the end of the day, it is again a retrieval problem, which is there is embeddings, you want to be able to convert your text or images into something that can search over that space and retrieve the relevant embeddings, and then map the embeddings back to text or images, what have you.

So the nature of the problem is still the same, the only difference is

Richie Cotton: what is getting embedded and what's getting searched. So it really seems fairly straightforward, you just have to have the right embedding to turn one of our image. Exactly. Video into that vector of numbers. Okay. Um, all right. And do you want to talk me through some examples of how you actually go about using these?

Like where would you care about having images or video or audio stored in the vector database?

Ram Sriharsha: Yeah. So, uh, I mean, today the dominant use cases we see are still text. So most people aren't, uh, text and to some extent images. Uh, often we don't see video as much yet. Uh, just, just as a community in terms of vector databases and so on.

Um, generally, I think what, what people, in a good use case, is just saying, detecting near duplicates, right? So if you want to detect whether you've seen a near duplicate, uh, one of the best ways to do it is to embed images, put them into a vector database, take your current image, look for the closest neighbors and see whether you're close enough.

If the images are close enough, then likely it's a duplicate. And there are people who use this sort of a duplicate detection workflow. Similarly, uh, there are people who use, uh, image search to be able to distinguish between going back to classification use cases, distinguish between read and, uh, Uh, good plan, a plan that's, uh, uh, uh, that's something that should be there in the farm versus something that's not supposed to be there in the farm.

Richie Cotton: That's really fascinating, especially the idea of like using it for image classification, I suppose. Um, I haven't really thought of that as being like a search problem. If these things are similar enough, then they're probably the same. Yes.

Ram Sriharsha: Yeah. What's fascinating is that it's being cast into most of these classification and labeling use cases today.

Okay. are being cast into a search problem, which is, which is awesome. Uh, it's, it's efficient and fast and we have vector databases that do that really well. Okay.

Richie Cotton: Um, yeah, that's, that's really interesting stuff. Um, so I'd like to talk a bit about, um, how you actually go about implementing these things in your organization.

And I'm curious as to which team would generally be involved in working with vector databases. Because it's got data in the name, you think it ought to be the data team, but I'm not totally sure whether that's the case.

Ram Sriharsha: First of all, if you look at people who are developing Gen AI applications, most of these are application developers, right?

So these aren't necessarily data teams or ML engineering teams and so on, uh, as they used to be. There was a time when, uh, semantic search workflows, for example, were put together by machine learning engineering teams or data teams. These days, it's any software developer who can do that. Can put together a version application or some sort of a application that actually can connect the LLMs with vector databases, uh, can make a lot of progress, uh, even on small amounts of data, moderate amounts of data, building very compelling applications that, that, uh, leverage in there.

So you're seeing more and more the application developer as the person who's actually using vector databases a lot, is using language models a lot. Uh, so you don't need data teams. To be able to do this. That said, your vector databases have to be hands off for that tool, right? So these, these cannot be the same sort of databases that people used to work with 10 years back, where you really need to understand the database.

You need to set it up, you need to size it somehow, you need to figure out or work around as rough edges and things like that. So that doesn't work anymore. That's also why people are increasingly turning to these sort of, uh, SaaS databases and so on, especially ones that don't require any tuning and this ally scale for you.

Uh, so I think that's what you're seeing more and more where we see data teams get involved or where we see production engineering teams get involved or ML engineering teams get involved is once people hit massive scale. It's once, once, once you start this way and you're hitting now, you now have to serve thousands or hundreds of thousands, maybe millions of customers, or your data volumes grow to terabytes of data and so on.

Now you really need to understand costs. You really need to understand, uh, you know, the, uh, how to scale the system. Um, Because even scalably ingesting terabytes of data is a hard problem from a software engineering side. That's where we see people usually put together data teams and kind of really get

Richie Cotton: involved at that

Ram Sriharsha: point.

Richie Cotton: That's interesting. So, um, on the sort of application side of things, all software engineering teams, probably engineering, um, part of the organization, but then the data engineering is needed for that pre processing and setting up the data pipelines as well.

Ram Sriharsha: Yes, yes, especially scaling them, right? When we talk about, uh, you know, ingesting large volumes of data or, uh, having workflows that can scale.

deal with dynamicity and all of this. Now suddenly data engineering becomes pretty important. Uh, it's less important when you're building a small application and trying to proof of concept or show the value or even for small workloads. So usually we see that there's a rapid adoption of small workloads from JNI developers and people who don't really have teams of software engineers working on this.

It's usually one person or a small group of people. But at the same time, once people hit scale. Then you have teams involved and, you know, now we are talking about really large scale applications.

Richie Cotton: What sort of skills are needed if you want to get involved in creating a generative AI application?

Ram Sriharsha: Main things are, I think, are, uh, the ability to put together, uh, you know, it's, it's, it's, you're still putting together a stack.

You're putting together, uh, language models, vector databases, pipelines. And building something of value, so it's really the strong, uh, understanding of the different parts of the stack and how they fit together to solve this problem, uh, the, uh, understanding of where the cost comes from in this sort of a pipeline.

How do you go about optimizing for it without losing quality? And it had a good understanding of the quality you're trying to hit, which is how do you think about metrics? How do you think about data? Uh, so it's, it's, uh, it's a new step these days. It's, uh, it's people need to understand data, data metrics, how, how do you optimize, how do you prompt engineer?

How do you optimize LLM to, to do his job? Well, likewise, how do you do retrieval better? Uh, what do you retrieve? What do you store and so on? So, uh, I think it's, uh, it's an emerging stack. You need a little bit of, uh, Uh, interest in understanding in how, uh, embeddings are created, how, uh, vector databases work, how language models work, and so on.

There's a lot of information out there. There's people typically start with, uh, you know, stuff. Yeah. My suggestion is start with the simplest thing, which is just put, put MLMs and vector databases together and start with something really simple. Then learn. From what, what is work, what's not work. A lot of advice out there on how to put together right workflows.

Pinecone has a lot of documentation, for example, on how to put them together. Uh, so there's a lot of starting points, but I would say that. This is like an emerging skill set. It's like people

Richie Cotton: have to figure out how to make all of this work together. Yeah. So really just understanding what the different tools are involved, how they fit together, and then just what the use cases are, get some of the basics.

How do you measure stuff? All right. Excellent. There's also an area where we are, you know,

Ram Sriharsha: in some sense, trying to lower the barrier, which is the fact that you have to put all of this together and you have to get some expertise in all of these things is something that we are trying to figure out how do you solve for, like, how do you lower the barrier to entry so that people don't have to think about all this?

And it's not just us who are doing it. I think this is, you'll see this happening over the coming years that this whole workflow gets simplified and presented in such a way that people can just use it like they

Richie Cotton: would use a model. Uh, can you tell me what sort of innovations are happening in the world of vector databases at the moment?

I think the,

Ram Sriharsha: the main innovations that are happening are A, around cost and scalability. So, uh, a lot of vector, vector search research prior to a couple of years back. It was all about squeezing performance, but it wasn't worried so much about cost. Most vector database algorithms out there, most vector database out there until Pinecone came along and even Pinecone serverless and so on, were all in memory databases for the most part.

And the cost associated with them is pretty dramatic. Uh, so there's a lot of research that we've done. There's a lot of research that, uh, the community is doing in terms of reducing cost, in terms of, uh, improving scale. In terms of dealing with the data management problems we talked about, how do you actually effectively update and delete and manage data when you have data sitting in vector databases?

Likewise, we talked about the fact that you cannot actually join with structured data, but you still have to deal with structured data. That leads to very unique challenges for vector databases. So there's a lot of research happening around how to solve that effectively. So on the vector database front, innovation is happening really around new types of indexing algorithms that can work on block storage, new types of indexing algorithms that dramatically reduce the cost of, uh, uh, maintaining this sort of data and the cost for a given quality and so on, at the same time, uh, dealing with the data management challenges.

Now, that's still, that's the effort focused around the vector database itself. But even around RAG and even around how do you combine language models with vector databases, there's new innovation happening constantly. So the things that I'm particularly excited about are the ones that fundamentally rethink the language model architecture in terms of how do you really leverage vector databases the best way with the, but in some sense, rethinking how language models should work.

There's a lot of research papers that try to attack this and there's some successfully, some partially, I mean some partially successfully, some not, and so on. But, uh, what I'm particularly excited about is the fact that everybody's paying attention to it. And I feel like that's a very productive way to think about how better

Richie Cotton: this is fit into the language model architecture in the future.

Okay, so some of the research is around sort of very sort of, um, targeted stuff. So I guess yeah, it's just like, yeah, it's quite, quite a nerdy sort of algorithmic thing. And then it also goes to the broader sort of technical solution as well. Exactly.

Ram Sriharsha: Yeah. Some very foundational ways in which language models are being fused together with their databases.

So that's, that's it. That's an interesting area

Richie Cotton: of research as well. And, um, with large language models, you mentioned there are some. very long context window, um, large language models at the moment. So this can store an awful lot of text in a single prompt or response. How does that change the use of vector databases?

To me, it doesn't actually

Ram Sriharsha: change how, uh, it doesn't outweigh the need for vector databases, or it doesn't fundamentally change how vector databases are used, but it does change some, uh, economics around how they're used. For example, uh, you, you no longer need to stress a lot about. How many candidates you retrieve from vector database, like you don't have to optimize to the nth degree what you do there, because you could retrieve all candidates and you can actually give it to a language model and hope that it can actually pick the relevant portions of your, uh, right?

Now, again, you cannot quite do this because language models are not created. Even, even now, they're not great at sifting through context, which is, you can provide them a really large context. They aren't necessarily great at sifting through context, especially things that, uh, appear towards the middle of the context window and things like that.

But theoretically, you could. So, theoretically, it, it, uh, reduces, in some sense, the over indexing on exactly what you retrieve from a vector database. You could retrieve a little bit more than you would normally retrieve. Right? Uh, other than that, it doesn't fundamentally change anything. So you still have to rely on vector databases for free will because there is no hope of feeding the entire corpus to two language models.

Even if you fed the entire corpus language models, the fact that they have, they have even infinite context window, it's not going to mean that they're going to produce a better result. So to me, context windows are useful when used correctly, uh, but they, they're

Richie Cotton: not a solution for not having it ready.

Okay. Okay, so it sounds like the large language model still needs to solve a problem of can you extract useful information out of a long context window before, um, you can actually really make use of them. Yes, and

Ram Sriharsha: there are fundamental reasons to believe that you can't just solve it within the LLM. You have to really solve it in a different workflow.

Richie Cotton: So, uh, do you have any final advice for organizations who are wanting to create generative AI applications? I think the most

Ram Sriharsha: important thing is to, uh, figure out use cases where you can actually get started and, and, uh, start simple, right? Because, uh, it's going to take time for language models to get really good at hallucination for, uh, for, like we discussed, drag to become like a package workflow that's, that people don't even have to think about, right?

So today it's still an art. It's still a lot of moving parts. You kind of need to worry about it, but nothing, I think nothing should prevent you from getting started. And, and there are a lot of use cases where you get tremendous amount of value, even from simple pipelines. So even from simply putting language models with VectorDBs together in a very simple workflow, you can already get substantially far.

And we've seen customers do that a lot. So, uh, to me, it's really important that you start getting into this, generic space and start really leveraging it and, and build your kind of, in some sense, your ability to iterate and the ability to think about your data differently. Think about labeling, think about actually measuring quality and so on, because all of this is really only developed by doing.

Right? So the faster you can get to doing that, the better.

Richie Cotton: Um, yeah, definitely a big fan of learning by doing, and I like the idea. Just try something you're going to have to iterate before you get something perfect. So yeah, get started. Yes. All right. Uh, yeah. Thank you for your time, Ram. Thanks so much.

Thanks for having me.

Topics

Artificial Intelligence

Data Engineering

Software Services

podcast

The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at Pinecone

RIchie and Elan explore LLMs, vector databases and the best use-cases for them, semantic search, the tech stack for AI applications, emerging roles within the AI space, the future of vector databases and AI, and much more.

podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.

Tutorial

Building Intelligent Applications with Pinecone Canopy: A Beginner's Guide

Explore using Canopy as an open-source Retrieval Augmented Generation (RAG) framework and context built on top of the Pinecone vector database.

Kurtis Pykes

Tutorial

Mastering Vector Databases with Pinecone Tutorial: A Comprehensive Guide

Dive into the world of vector databases with our in-depth tutorial on Pinecone. Discover how to efficiently handle high-dimensional data, understand unstructured data, and harness the power of vector embeddings for AI-driven applications.

Moez Ali

code-along

Engineering Retrieval Augmented Generation Applications for the Enterprise with Pinecone

Roie, a Staff Developer Advocate at Pinecone, shows you how to build an enterprise-quality RAG application.

Roie Schwaber-Cohen

See More See More