Getting Generative AI Into Production with Lin Qiao, CEO and Co-Founder of Fireworks AI
Lin Qiao is the CEO and Co-Founder of Fireworks AI. She previously worked at Meta as a Senior Director of Engineering and as head of Meta's PyTorch, served as a Tech Lead at Linkedin, and worked as a Researcher and Software Engineer at IBM.
Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Key Quotes
I think that's just the nature of AI. It requires a lot of experimentation. And there are multiple things to experiment on. At the top, experimentation is the top of the funnel. You need to experiment on, hey, is this how you frame the problem, right? How you frame the problem you want to solve is the framing, right? Because the framing will directly determine which technology you're going to pull into the stack to solve the problem.
I'm very excited about applications built on top of it. and there, there are really great application that I have saying, already for like consumer facing developer facing products. But, recently there's emerging application that, focus on have agents, specialized agents, solving specific problem really well to have them coordinated with each other and to solve a more complex problem. I think that direction is super exciting because we have seen that from AlphaGo experience, AlphaGo can learn and improve the skills of playing Go by playing Go with itself. And that level of attention autonomous learning is going to not get into the agentic world and that will pump up the intelligence even more. So I'm super excited about that direction.
Key Takeaways
Enterprises should move away from a “one-size-fits all,” large foundation model approach, shifting to fine-tuned, smaller models for specific business tasks.
Keep up to date with the open-source AI model community, this is where valuable leaps for driving AI innovation forward are happening.
Before getting started with fine-tuning, clearly define your objectives and what you aim to achieve with the model. This will help avoid unnecessary complexity and ensure that your efforts lead to meaningful improvements.
Transcript
Richie Cotton: Hi Lin, thank you for joining me on the show.
Lin Qiao: Hi, thanks for having me. I'm super excited to have a chat with you today.
Richie Cotton: Yeah, this is going to be fun. so just to begin with, can you talk me through what are the most common uses of generative AI that you are seeing today?
Lin Qiao: Oh, that would be a, it would take one hour by itself. So yeah, so it is, absolutely This technology is so empowering and so disruptive. We have absolutely seen a lot of interesting, innovative applications of products built on top of our service tier. Of course, Firewalls offer inference engine for GNI models.
So I will kind of bucketize them different ways. So one big bucket is sort of assistance. So we have seen people building medical assistance to address the shortage of doctors or nurses. and help those medical workers become productive. we have been seeing people building, many people building educational systems, target different cohort.
for example, there are people bidding, applications, for students. to learn, for researchers to do research, and for people to learn foreign languages. , so it kind of cross border, there are various different applications, education. There are also, ,legal assistance where, they find people to, do a case study, and, because kind of lawyers are expensive or help lawyer to become more productive.
So I think that's a Big chunk of application and then we see lot of chat just kind of Interesting chat,... See more
another big bucket is kind of API generation. Or code generation. It's not for end consumers. It's for another program. So this is important because we are seeing the compounding effects of people building applications on top of not just a single model, but multiple models or multiple modalities or from multiple modalities to additional APIs.
, so there are examples, the totality of knowledge, I can draw down more to that because a single model has limited, very limited knowledge by itself. and the specific use cases are like generating business workflows, generating SQLs, generating, Chart about the production usage, monitoring observability and so on.
So that's a huge bucket. yeah, so those are kind of across the board where we're super excited of what's happening.
Richie Cotton: Okay, I feel like if you continued, you could probably just listed like every industry that there is, but that's really interesting that you've got stuff that is blind to end users. So you mentioned things like productivity improvements and training. Then you've also got that sort of middleware layer. So it's for other people building technology as well.
since today we're, we're going to talk about getting things, into products and into production, I'd like to know just at a high level, what are the different steps for getting generative AI into a product?
Lin Qiao: First of all, I would say application good. On top of, on now application. Pre I, there are a few differences. ,one is pre I application built on top of CPUs. It's heavily commoditized. the stack, software stack is very mature. and the tools are kind of abundance. So it's kind of easy and low cost.
With GNI, it's shifted, right? GPU is new to most of our people. And the GPU is very expensive. GPU is power hungry, power expensive. Power generates heat. heat requires cooling. Cooling is expensive. So the whole cost structure of productionized. Application on top of JNI significant change right now.
That's one. , and, second, the between, like, before and after, whether you're building on GNI or not, one thing that doesn't change is most of those applications is consumer facing, is developer facing. It needs to be very interactive. so latency is a critical part of product experience. Holding the latency bar is Doesn't change before and after.
And another interesting thing that is different is before Gen AI, the logic, the reasoning logic is coded in the application itself, right? Or it's coded in the surrounding infrastructure. So it's deterministic. And you can debug and the logic is there. With JNI, the JNI models have multiple layers and with multiple layers of prediction, and it's probabilistic.
probabilistic means if you ask the JNI model to produce something, it will do it, regardless whether it's true or not. so that directly introduces a new phenomenon, uh, correlation, which application developers never need to deal with. Now they're scratching their head like, oh, this is bad. , and I need to learn how to control or minimize Policenation in my product, in my application.
So those are the kind of one, invariant and two variants that's happening, in kind of pre JNI and, current JNI era. And because of that, that introduced new production, challenges for our app developers. For the invariant, low latency, right? As we know, the general models is, the extremely big side of machine learning models.
and because the model is so big, and the latency will be high. And, not having a low latency product experience, it will be break or make it, scenario for the app developers. is extremely important, we need to address that. and the cost is, It's also an important factor. Even if you have a viable product, if the cost is very high, And the nature of consumer facing, developer facing apps, they scale quickly if you hit a product market bid.
When you scale quickly, you're losing money at a small scale. You're going to bankrupt at a large scale. So controlling cost is extremely important to go into production. and also to produce great product quality and content, you have to control hallucination. and learning the skill set, how to do that is extremely important.
So I think those are the kind of three critical parts to bring generic technology into production.
Richie Cotton: lost on pipeline, lots of interesting changes. So certainly you mentioned like the switch from using CPUs to GPUs. NVIDIA has done very well out of this switch, I think.
Lin Qiao: Definitely.
Richie Cotton: And then the idea of going from deterministic programming to stuff. I felt there's very different reactions there.
Like, all the software engineers are panicking and then you speak to data scientists and like, isn't that normal for code to be probabilistic? So yeah, very interesting changes and differences in opinion there. We'll maybe get back into talking about latency later on, but for now, I'd like to talk about, fine tuning, since this seems like, quite an important sort of, stage in, making sure that you have high quality, results.
So, can you just give me a quick introduction to, like, why you might want to fine tune a large language model rather than just taking something off the shelf?
Lin Qiao: So here I want to talk a little bit about, the process of model training, right? So, this is before fine tuning to train a foundation model or train a model in general. before you train, the researchers, they need to be opinionated on what is the objectives. For the end result they want to achieve.
so that whole process will determine, what kind of problem you care the most, what kind of problem you are okay with, and what kind of problem you don't even care. And from there, you go and curate your training data, and for the areas you care the most, you want the most high quality data, you want the most diversified data, and it requires a lot of care.
And for areas you just want to have some courage, and it's okay, then you don't have to spend too much time, but the volume needs to be there. For areas you don't care, then you don't, really actively source data to cover those areas. And the end result of training will reflect that, right? So, that's interesting because then there is, when people build foundation models, the concept of foundation model is the basis, a creative basis of solving a lot of problems.
but you see, a lot of problems. In real life, , in enterprise, for example, it may be slightly different or very different from when the base model is built. . so that means after model is trained, You need to drive alignment for that foundation model to align towards your problem statement better.
so called up training or, instruction tuning. and then fine tuning is kind of built on top of that. so that's kind of the reason why to do fine tuning is basically, aligning a base model better towards a specific problem. So specific workload patterns or specific vocabularies you have within your company, for your application, for your new application, for your new product, to deliver a better results.
Richie Cotton: That's really interesting that you said you've got to think about the goal of what you want your model to do. It sounds obvious when you say it out loud, but I think. it's interesting that a lot of these sort of foundation models, you think of like GPT and, the anthropic, like the Claude models, they're designed to be able to write any form of text.
You don't necessarily want that in a business context, like actually you just wanted to do sentiment analysis on financial statements or something like that. Then it's a much narrower focus there. So pretty interesting. Do you want to talk me through what's the process for actually going through, Fine tuning model then.
So you've decided on your goal, you've got your base model, how do you actually go about fine tuning it?
Lin Qiao: So first thing to create your evaluation data set. As you said, , the gold standard, what is the ideal content you want to produce, with a given prompt, right? So that is one that you need to establish. Without that, it's really hard to judge. like you put all this effort by tuning is that reaching a goal, progressing towards a goal, or going the other way.
So that is very important to establish. , and the second is, you just kind of run your workload through whatever model you have and be able to identify , what are the failure cases. For the failure cases, then there are multiple possible next steps. One is you take the failure cases and you manually label what is the right outcome and that becomes your fine tuning data set, right?
This is supervised fine tuning or, with the failure cases, you can, ask, Stronger model, or ask user to give you, what is the preference between two answers? Do you prefer this versus that? and then that build the preference data set. And then you can use that to do preference tuning, preference based tuning.
Or you can just in your UI do thumbs up, thumbs down. it basically ask user , to give you scores of whether they like the answer or not and use that to to, the base model that's KTO or there are many other, different kind of tuning methods that you can use based on, how close you are towards your end goals.
And then you tune and then you have your eval data set. you never tune with anybody. You have to keep that aside because that's your golden data side to judge. and then you can see here how, you know, are you progressing in the right direction? sometimes you fix one kind of problem, but the other problem, the other side, , stands still or the other side regrets and you kind of have to figure out.
so those are kind of, I just talked about the mechanical part of fine tuning or tuning in general. But they're actually, much deeper than that because sometimes say there's a failure case. It's not, , it requires part of design. It's a failure case because people haven't thought about how to handle it in the right way.
So it kind of impose a question like even a product manager. need to come in and make a decision. What is the right action? , if you see this better case, so it's actually very interesting. I think, fine tuning is not just a mechanical process that machinery engineer, need to power through.
it is actually a process that The product managers or product engineers, along with machine learning engineers, they have to work together to make decisions. I
Richie Cotton: That's fascinating. I suppose, yeah, you can think of it as like, we just got to get a true false answer for like, is this a good answer or not? But then from the product point of view, you need to decide, what do you do if you get a bad answer? So it's going to have to be built right into your product rather than just being this sort of abstract theoretical thing.
Okay, that's interesting.
Lin Qiao: forgot one thing, actually. One very important thing Is when you fine tune, you also may need to make decisions, because part of fine tuning is that is about how to align the model better with your problem, right? So that's a Delta part. But you also need to think about what kind of for the base model.
It's very capable of doing many things. What are the interesting characters you want to preserve from the base model, right? So some base model is really good at. And doing check some good base model is really good at long contacts and all these different capabilities. one thing is the model is really good at also forgetting during the training process.
If your training data doesn't cover those characters you really want to preserve, it will forget about them. You only learn the new skill set. So it's very important. Also to think through, how you want to like the end result of the model, what kind of strength you want to drive. and then you need to mix your, label data, for fine tuning, for your specific use case alignment, along with, some, other data set, , helps preserving other characters of the base model.
So bringing that, properly is extremely important.
Richie Cotton: Do you have a sense of how much effort this all takes and how much benefit you're going to get from it? , what's the sort of performance increase that you're likely to get and how much effort do you have to put in to get that performance increase?
Lin Qiao: This is a really good question. because typically people want to increase quality, they can go to through two paths, and it's kind of different level of effort. One path is prompt engineering, right? You can put, instructions in the prompt, to help align the model. so that's one approach.
It's very quick. It's instant. You can put instruction there, and then you can test it. and see how the model responds. It's very quick. Versus fine tuning, it's a long process because it's also a multi step process. You have to define evaluation data, as I mentioned, collect, label data for fine tuning, decide which approach you're going to fine tune, , and you're going to fine tune, and make sure, fine tuning will run.
Sometimes it ooms. sometimes if it's stuck somewhere, it will also need to converge and all this. And then finally you get a model. Then you need to test it again, use your eval data, and then you need to bring the model into a production setting and send the prompt in, and so it's a kind of much longer process than prompt engineering.
so we have seen like at the early phase of, pre product market fit and people have really, have serious questions should I even spend time doing all this nine yards of Functioning is kind of, I don't even know if you know where I'm going. So, like, people tend to use a much more powerful model so they don't worry about other production related things.
Or they just problem engineer their way out. And validate, it's a path to work, there's a path to work. and after they hit the product market bed, and then all the other concerns like low latency and cost, becomes an issue, and then they can spend more time to fine tune, right? So we are seeing these patterns in terms of kind of being the most efficient, from, pre part market fit to part of market fit in production.
Richie Cotton: That's interesting, so I suppose then you've got a trade off between do I use this larger model, which is going to have, probably higher performance out of the box, but it's going to cost more to run, versus do I use the smaller model and just Pay the money to put the effort into fine tune and get the same performance, but then it's going to be cheaper to run in the long term.
, you've got the short term, long term trade off cost then.
Lin Qiao: Right. The interesting dynamics in the industry, in the modeling space is, the model quality is also changing, right? So the largest model, their quality keeps improving, but. The small model, I call it a small model stack. Large model stack, small model stack. The small model stack quality is improving very fast as well.
so almost every week there's a new small model that being announced for various different model provider and they are kind of getting really close to those large model from various different kind of benchmark results. and that progress is very impressive. So , it will be interesting to observe how these dynamics play out that I will say.
when customer come to us and it's not clear they are doing the kind of product market fit stage, I will actually suggest them actually to try on the large model first. But with this new advancement, I think the small models are becoming more and more appealing, even for pre product market fit stage.
Um,
Richie Cotton: for those is, because small, large language model doesn't seem to make any sense. Is there a proper word for that? It's just a
Lin Qiao: yeah, so I think we
Richie Cotton: model, maybe.
Lin Qiao: Yeah, they're small, but they're actually very large. That's just small in, relatively to each other.
Richie Cotton: Okay. Fair enough. can you give us some examples of tasks where you can get away with using one of these smaller models versus, a task where you really do need that large best in class model?
Lin Qiao: Yeah, so we have seen a lot of advancement in, for example, code generation. Recently I think, Mistral announced the CodeStraw. that's their code generation model. It's very impressive. And about a week or two later, there's a new code generation model, DeepSeq Coder v2 that got announced.
It beats everybody. It demonstrate like GPT 40 level quality. of course I want to caution everybody Benchmark comparison is one thing and what's real is another thing. So I will caution everyone to actually test it out by yourself. And that's kind of the beauty of the open source community.
They are very generous in sharing their findings. so I think we also have a lot of people using us. We actually have a lot of feedback. and we're always on the front of enabling those cutting edge models , for folks to try out. So far, like, that's our observation is that in the coding space, there's, the relatively small models, like 70 billion models are getting really, really good.
and the corresponding 7 billion, 8 billion models are also trading the 7 billion model quality pretty closely. but it can also significantly reduce latency in the car. So I think in the kind of small model space it's getting more interesting now.
Richie Cotton: There's a bit more competition and progress is being made fast. So, you said that you need to test your model on, something to make sure it actually works. Is that the case of typing lots of prompts, having a chat with the model, or is there something more thorough and robust you can do to test whether it works or not?
Lin Qiao: Yeah, so that goes back to you. your eval data set by so I was strongly encouraged. the app builders or, data scientists, they are like into genii space, to prioritize having your golden data set For your quality testing, because the modern space is fast moving. whether you're on the big, model stack or small model stack, there will be constantly new advancement that's happening.
and if you want to stay on top of all these, you need to have your own data set to validate, how the new model advancement is giving back, Benefits to you sometimes, it doesn't, right? Sometimes even regrets. so I would say everyone's workload is different. So don't blindly trust, , the benchmark results.
except for one thing, I have really good confidence, the benchmark that lmsys. org is running, because they have humans just trying multiple, models side by side, and collect the ELO score. and that is more, it's kind of subjective, but also , it's actually at the same time more objective than, just purely, the benchmarking results.
Richie Cotton: so this is the chatbot arena, and you can see which ones are the top models. Okay. You mentioned this idea of an evaluator set. So suppose you've got, yeah, creating a chat bot or something. Is this going to be like an example of like the top 100 questions your users are going to ask and you need to see what the bot is going to generate for each of those?
Lin Qiao: Every chatbot probably have different focus and a different kind of, niche they want to tackle. For example, for some case, some chatbot really care about multilingual support. , and, how, like what they have specific language they want to cover, not just English. And that's important for them.
, for some chatbot slash assistant, they really care about. Long context, um, contestment is not just the check context of multi term, but also and the process of generating response is, the apple feed rag, as the input to help the response to be more relevant to the particular individual or particular, scenario.
so for some chat, like long contact is very important. and for some chats, they also want to, not just doing text based chat, they want to call out into other APIs. to get the totality of the additional information. for example, get some kind of search result, get some stock price and get weather information, generate the image, and so on and so forth.
so that is, the direction we're moving forward, is to kind of help people build a, compound a system where you are not just using Jenny. I single model at a time. You want to be able to assemble multiple things together to solve a problem. We see that complexity in the chat bot. so we're kind of building the system to make that job easier.
So coming back is really depends. again, if you're building a system to solve specific problem, what is medical, legal, educational. Other things coding assistance on, you need to have your golden evaluation with a set to sit on top of all these moving targets of models.
Richie Cotton: so really make sure you've got, , performance in the specific language you want, the specific industry or domain that you want, and just check that rather than just trying to test it on generic queries. Okay. , now, you mentioned that, these smaller models are catching up in terms of performance, and that's obviously going to save money compared to using a larger model.
Since a lot of companies are worried about the cost of putting generative AI into production, do you have any other money saving tips? What else can you do beyond using smaller models to make things cheaper?
Lin Qiao: Yeah, so coming back to why it's so costly, right? So I briefly mentioned because the model size is just so big. , it's a lot of speed. Hey, you know, it's just kind of processing. The computational cost is just high. And that's one thing. But the other aspect is, that like the GPUs are very powerful, right?
So most of the general models, though, they run on GPUs. Some run on custom ASICs too. So , those hardware are very powerful as well. And the other side of the cost is how utilized are those hardware? Are you pumping, competition out of them constantly, Of course, the ideal state is a hundred percent.
Of course, we can all reach a hundred percent, but. Getting close to 100 percent the best. So in order to drive utilization, then we need to kind of take a deeper look at the system resource consumption , during the inference time. and the interesting part is, of course, the model has multiple components, and each components are bottlenecked by different things.
, Like prompt processing is bottlenecked by just by flops. and generation. The content generation part is a bottleneck by memory bandwidth. So if you think about scaling, , like increasing the throughput, that reduce cost, right? Increasing throughput, you cannot increase throughput to address two bottlenecks at the same time.
Does that make it? You can, but it's much harder, right? So it's much easier to tackle one bottleneck at a time. So that requires us to think about , the runtime of inference differently. . So you kind of have to scale different possible model differently. And that's why we have. And sometimes you also need to go across beyond one node.
You need to kind of have multiple. GPU nodes, each node have eight cards and eight GPUs. you need to have multiple nodes to scale out kind of be most efficient. that's why we specialize kind of the best scaling algorithm to kind of significantly reduce latency and increase throughput.
Richie Cotton: that's absolutely fascinating that, as you said, the prompting side of things is limited by compute and then the generation side is limited by memory, so very different problems. So how much do individual companies or organizations need to worry about this infrastructure side of things, and how much is just sort of taken care of them by the platforms that they're using?
Lin Qiao: I think whoever is building these general applications, it depends on the skill set. I think we work with many product teams, whether they are startups innovating on the product space or product team within incumbents or enterprise. So the product team really want to move focus on product development.
It's focused on kind of helping adoption and make their user happy. and they want someone else to handle infrastructure problems. And here is not just about low latency and cost efficiency. It's also about operation because inference is 24 by seven. it's just kind of you have to make sure You have high quality of service and requires a team, to kind of babysit the staff and so on.
so usually it's very important for power team to have a infrastructure partner, to help them. So that's where we're coming, right? So, hey, we can be our infrastructure partner. But more interestingly, we're also seeing the M. O. Infra team will like to, get accelerate the progress.
They can support their company with various different product teams building on top of them, by kind of, bringing vendors, bringing the state ,of our technologies, and blend the state of our technology into their existing infrastructure pieces. so we have been working with those big enterprise in my team, to help them move faster.
and, , I think the fundamental reason is across the whole entire industry from, for applications that , to kind of, platform stack to infrastructure stack, everything is moving so fast. And people recognize that people recognize that they were rather Leverage and build on top of other people's work instead of taking on the full stack by themselves.
And I, I actually have a huge respect of, of those engineers making those decisions because it's kind of the proud of being engineers. Hey, I am capable of doing everything. and it's very easy, actually easy to say, hey, there are open source Libraries, and the repos we can use to build a whole entire stack.
We can have the full control, but, their leaders, they just so sharp, they understand where they want to focus and they understand where to bring, other expertise in to help them. So it's a win win. They can move Like 2X, 10X faster that way. So we have seen a lot of those.
Richie Cotton: It sounds like the trade off is if you've got, an in house AI engineering team on hand then, you know, probably they can do some of this themselves, but for everyone else you're going to want to partner with, someone else, with another company that's going to help out with that infrastructure, that side of things.
Lin Qiao: Yeah, so even with in house ML info team, I think we have seen them actively integrating the best technology , from us, for example, in order to kind of supporting their other product teams better.
Richie Cotton: so it sounds like, there can be quite a few teams involved in creating an AI product or adding AI to some product. So beyond the product team, you mentioned that some sort of infrastructure requirements. Who else is needed? So which teams or roles need to be involved in creating an AI product?
Lin Qiao: We definitely see I mean the product in themselves, but, usually it's the M. R. Infra team and the M. R. Infra team, typically in the digital native enterprise, they already exist, to bring, small M. R. machine learning models. into production, and they're kind of getting into larger deep learning models.
And this gen AI is a huge leap forward because model size is much bigger. It really need, beefier GPUs, the most data large GPUs. and that kind of put a lot of, lot of stress on their infrastructure. but I think, typically, the MyInfo team is another stakeholder.
Richie Cotton: that's interesting. And, before you get started, like, I suppose if your CEO says, okay, we need to get some, get in on this AI game. We need to add AI to our products. What do you need to put in place before you can do this? So what sort of processes need to be put in place? What other things, infrastructure needs to be put in place before you can actually be successful at this?
Lin Qiao: it depends on, like, AI is a very broad term. AI could mean, you are doing simple classification, or it can be you're doing forecasting, you're doing ranking recommendation, or you are doing JNI. So, the kind of characteristics of those workloads are quite different from each other. and, we have seen, kind of, different companies are at different stages of, this journey , of the kind of AI evolution.
, so if we talk about the high end, like in general space, then, , for the big company, typically, like here I'm talking about like Fortune 500 companies, and they may either use a JNN API provider to get things going. Or, they have already procured a GPU. So they kind of procure GPU in house, on their, in their cloud account or on premise.
and then they will bring different software running on top of those GPUs and to serve GNI for them, whether it's fine tuning or it's inference.
Richie Cotton: , so basically you can get started quite quickly with just using one of these, , API services for Generative AI, but if you can afford the hardware, then you know, you buy your own GPUs and then you can start doing things. I'm curious as to what success looks like. So there's a lot of companies where they've gone, okay, let's add some AI features to our products.
How do you know if that's a good thing and if it's worked?
Lin Qiao: Again, there are different stages. I think that's just the nature of AI. It requires a lot of experimentation. And there are multiple things to experiment on at the top experimentation is the top of the funnel. you need to experiment on. Hey, is this. how you frame the problem, right? How you frame the problem you want to solve is the framing, right?
Because framing will directly determine which technology you're going to pull into the stack to solve the problem. , so that's kind of number one is testing out your framing. and then you need to test out which model, you want to use, you want to use big model or small model. , and then, you would.
You may have some gaps, either way. So then you need to kind of experiment. Do you prompt or do you fine tune? So there are like, I will say it's a bushy tree of experimentation. you need to try, in the, kind of the top of our nose stage. but once you get into good quality, then, and also kind of the product makes sense.
Then you kind of go down to the next level of the firm. You're going to kind of early stage production, right? So by early stage production, then you need to worry about production quality, metrics, including latency, including, may or may not be a cost because it's not the scale may not be there.
but, you want to start to kind of convert from, experimentation code into production quality code. and then going to high scale production where cost, it will become an issue. ,so yeah, going down this, I think different stage you need to kind of work on different aspects.
Richie Cotton: so there's quite a few stages to it. and yeah, once you've got things in production, are there any, have you got any concrete examples of like, a metric that says, okay, this is what you measure and we know if you've succeeded? Is it going to be about like, usage, or is it going to be about like, quality of results?
, how do you go about saying this is good or not?
Lin Qiao: again, so if you have, you can kind of dissect what you want to validate, the most important thing is by the data product, right? So by product, it's, you know, you have user, you have end consumer or developer using it, then typically it goes by the user engagement metrics. let's again taking coding co pilot as one example.
, so it's kind of if you generate code, then the acceptance rate. Of code people, you know, that did describe the quality or just in general, people's interaction, right? Even they decline. It's also interaction. It will count as interaction, but rather than people just ignore it and keep typing, it doesn't matter.
I don't care. Like, what do you generate? Right? so there are different ways to measure user engagement. And I, think pump up using management is the It's important direction to focus on. And second is user adoption, right? So, there are more people using it because they are fascinated by this new product experience and they want to get onto it.
So it's a natural expansion of the user base. So that's also very important. But then you go, that's all at the part level. Then you go one level down to validate the inference stack, right? So which Support the product itself, the general inference tag, then it boils down to, , hey, quality, but quality really depends on how you define your evaluation data set, and how the kind of changing the inference tag is affecting it, and how the inference tag help you reducing latency or is the latency regress, so it affect user engagement and how the inference tag help you be cost efficient, right?
Across these two, okay. Different areas, all these different areas. So it will be multiple layers or metrics. Usually a product team have to watch out.
Richie Cotton: I like that it's just simple stuff like, are users actually engaging with this right through to, have they got, a great experience and, you know, do they actually enjoy the results? Have you seen the success stories from this? what's your favorite, AI success story?
Lin Qiao: Oh, yeah. so we have, many assistance, coding, copilot applications run on top of us. and we are also leading provider, in terms of our influence engine. Our latency is really, really good. and when people migrate to us, their latency drop by five times and their user engagement.
Immediate goes up. They show us the curve. it's strongly correlated. User engagement pumped two times, three times, , just the day, switch over to us. That's very, very encouraging. so, yeah, so we have plenty of those examples. And also when our customer, we also constantly help them upgrade to better models.
and they show us, hey, with better models, it generates much stronger content, and that makes happy as well.
Richie Cotton: Okay, that does sound like a, happy story. On the flip side, are there any common mistakes that organizations make when they're trying to, make use of AI in their products?
Lin Qiao: I think one challenge we see, across those, like this way about adoption is they may move into customization too early. where it's unclear what's their objectives. then they're just busy going through the process of procure data, fine tuning, which take a long time, get to a point they are super frustrated.
It's unclear if they are moving the needle because the needle was never defined. but also, it takes them a lot of time. and at the end, it may not . come out very strongly, or, the objective can be clear, but the process of, curing the data and learning all the kind of nitty gritty details of how to find you, is too overwhelming.
and then kind of, they give up the middle of doing that. , that's where. We want to make it super easy, , as a kind of platform provider to, remove those, tedious work and the kind of time consuming work, and help our user , focus on the kind of the intelligent part, where you can get the best ROI.
Richie Cotton: I like that, , the biggest problem is people not planning because they're not knowing what they actually want the AI to do. So it just seemed like, yeah, spending a bit of time up front and figuring out what you want is going to be pretty important to make sure it actually works. earlier you mentioned, open source models a few times.
So, do you have, a sense of when open source, models are better than their sort of closed source equivalents or vice versa? Like, when should you prefer one or the other?
Lin Qiao: Yeah, so this is a really good question. I think it's not 100 percent correct, but I will coarsely correlate open source model with small models and the proprietary model closed source with big models where I would kind of coarsely, it's correct. so it's, there's no question.
The big models are like better quality overall because it's big. and the kind of, it provides this. ease of use, sense of ease of use, because, off the shelf you can have it do many things. So it's very easy to bootstrap on other big models, and I think that's why, that OpenAI did a great job, and it's kind of, their popularity speaks volume of it.
And then the kind of the obvious big models, it has a jig, and it will be slower. it will be more expensive. There's a, it's just like, lots of physics there. and then, of course, the advantage of small model is, yeah, it's smaller and then it's faster, much, much faster and more cost efficient.
But then the quality is not there, right? So quality has a gap. and then the most important thing is, Yeah. yes, latency and the cost is very important for production and production scaling. But if quality is not there, doesn't make sense. Also, so how to shrink down the quality gap is very important before you can start to use small models.
Richie Cotton: Okay, so really, you're juggling a few different sort of competing criteria there around, cost and performance, and, you mentioned latency again, so like the speed of generation of output. Just to wrap up, what are you most excited about in the world of AI?
Lin Qiao: In the world of AI, I'm definitely excited about generally AI a very unique category. But within general AI, I'm very excited about applications built on top of it. and are really great applications that I have seen already for like consumer facing, developer facing products. But recently there's an emerging application that focus on have agents specialized agent on solving specific problem really well to have them coordinated with each other and just have a more complex problem.
I think that direction is super exciting, , because we have seen that. from AlphaGo experience, AlphaGo can learn and improve the skills of playing Go by playing Go with itself. , and , that level of, autonomous, learning, is gonna, you know, get into the agentic world and that will pump up the intelligence even more.
So, so I'm, I'm super excited about that direction.
Richie Cotton: yeah, that's pretty cool just being able to learn by, playing against yourself. so yeah, it's cool that AI can improve itself. , mildly scary in some respects, but I think it's, how a lot of,
Lin Qiao: there yet. We're, we're, we're far away. We're far, far, far away from there. But, um, yeah, that direction is very interesting,
Richie Cotton: Yeah, absolutely so. And do you have any final advice for organizations wanting to, put AI into their products?
Lin Qiao: I think, I've been working with many organizations. they are kind of very wise in. Talking to the experts in the area. I personally spent a long time talking with organizations. One is kind of for me to see what's important for them, but also for them to kind of like if I have.
any suggestion for them, I will kind of interact directly with them and help them solve problems. I think, that pattern like tend to work out really well, especially where at the early stage of this evolution. and, I will kind of strongly encourage Organizations think about adopting the cutting edge technology to, build a rapport, with technical expertise in the space.
it can be a roundtable. It can be, some kind of discussions, just kind of help you kind of, , create a good mental picture of, where it makes more sense for you to invest and what are the commercial opportunities for your business and where you should kind of, put our, biggest effort behind.
Richie Cotton: Okay, yeah, I love that idea of having a discussion about what you actually want, whether it's with, like putting those, People who are involved in the commercial side of things and the technical side of things together. So, yeah, that does seem incredibly important. all right, super. thank you so much for your time and it was a great discussion.
Lin Qiao: Awesome. It's very, very nice talking with you too.
podcast
The Past, Present & Future of Generative AI—With Joanne Chen, General Partner at Foundation Capital
podcast
The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc
podcast
Developing Generative AI Applications with Dmitry Shapiro, CEO of MindStudio
podcast
From BI to AI with Nick Magnuson, Head of AI at Qlik
podcast
High Performance Generative AI Applications with Ram Sriharsha, CTO at Pinecone
podcast