Human Guardrails in Generative AI with Wendy Gonzalez & Duncan Curtis, CEO & SVP of Gen AI at Sama

Richie, Wendy, and Duncan explore the importance of using specialized data with LLMs, the role of data enrichment in improving AI accuracy, the balance between automation and human oversight, the significance of responsible AI practices, and much more.

Jun 23, 2025

Guest

Wendy Gonzalez

Wendy Gonzalez is the CEO — and former COO — of Sama, a company leading the way in ethical AI by delivering accurate, human-annotated data while advancing economic opportunity in underserved communities. She joined Sama in 2015 and has been central to scaling both its global operations and its mission-driven business model, which has helped over 65,000 people lift themselves out of poverty through dignified digital work. With over 20 years of experience in the tech and data space, Wendy’s held leadership roles at EY, Capgemini, and Cycle30, where she built and managed high-performing teams across complex, global environments. Her leadership style blends operational excellence with deep purpose — ensuring that innovation doesn’t come at the expense of integrity. Wendy is also a vocal advocate for inclusive AI and sustainable impact, regularly speaking on how companies can balance cutting-edge technology with real-world responsibility.

Guest

Duncan Curtis

Duncan Curtis is the Senior Vice President of Generative AI at Sama, where he leads the development of AI-powered tools that are shaping the future of data annotation. With a background in product leadership and machine learning, Duncan has spent his career building scalable systems that bridge cutting-edge technology with real-world impact. Before joining Sama, he led teams at companies like Google, where he worked on large-scale personalization systems, and contributed to AI product strategy across multiple sectors. At Sama, he's focused on harnessing the power of generative AI to improve quality, speed, and efficiency — all while keeping human oversight and ethical practices at the core. Duncan brings a unique perspective to the AI space: one that’s grounded in technical expertise, but always oriented toward practical solutions and responsible innovation.

Host

Richie Cotton

Key Quotes

Large language models are really good at getting generic answers right. However, when it comes down to actually doing what we do in our day-to-day businesses, it often requires much more specialized knowledge.

We have moved almost 70,000 people out of poverty since we have started this company. And what we do is we have an impact hiring model where we hire on the basis of impact.

Key Takeaways

Utilize your own data to fine-tune large language models for domain-specific tasks, ensuring they align with your internal policies and specialized knowledge, which generic models may not cover effectively.

During model development, analyzing dataset distributions—like gender, education, or vehicle types—helps uncover and mitigate biases that may otherwise propagate through the AI system.

Regularly evaluate and update your AI models with human oversight to ensure they adapt to new trends and maintain high performance, as data and business needs evolve over time.

Links From The Show

Sama

Course: Generative AI Concepts

Transcript

Richie Cotton

Welcome to DataFramed. This is Richie. There are now many great AI models that perform well off the shelf, but for some business or scientific use cases, performance is still lacking. Behind every great model is a great data set, and if you want your computer vision model to reliably detect your product, or you want your call center AI to reliably answer support questions that your customers care about, then you're going to have to provide your AI with your business data.

Richie Cotton

Today, we're looking at the process of collecting and enriching your data. Then feeding it to your AI. The better performance. I also want to know about the data annotation workers who are the hidden stars of the AI revolution. I have two guests for you from the data annotation and Model Evaluation company. Sama. Wendy Gonzalez is steering the ship as CEO, having moved into the role from CEO.

Richie Cotton

She joined Sama from a background as a product lead at an IoT startup, and she's also spent time as a management consultant with stints at EEI and Capgemini in 2023. Wendy made women tech networks 100 executive Women in Tech to watch list. Duncan Curtis is senior VP of product and technology. He's got 15 years experience as a product manager.

Richie Cotton

Having previously run Google Play Games and also products at Zoox and Active. Let's find out how to make great data sets for your AI.

Duncan Curtis

Large langua... See more

ge models are really good at getting generic answers right. However, when it comes down to actually doing what we do in our day to day businesses often requires much more specialized knowledge.

Wendy Gonzalez

So we've moved almost 70,000 people out of poverty since we have started this this company. What we do is we have an impact hiring model where we hire on the basis of impact.

Richie Cotton

Hi there, Wendy and Duncan, welcome to the show.

Wendy Gonzalez

Thank you. Glad to be here.

Richie Cotton

Brilliant. So there are a lot of really great foundation large language models. So why do you need to use your own data rather than just an off the shelf model?

Duncan Curtis

Large language models are really good at getting generic answers right. And so they've been trained on a very large corpus of data across the internet or a large portion of it. However, when it comes down to actually doing what we do in our day to day businesses, it often requires much more specialized knowledge. And so what we find is that while models are really great for things like writing an email that's a generic email, it might actually really struggle if you wanted to do something within a specific domain, like referencing law or referencing, you know, HR policies.

Duncan Curtis

Or even better, what about referencing your own internal documents? So if you wanted to write an email out to your employees, but oh wait, I need to make sure that it actually, you know, follows our policies. That's not going to be as easy to do with a generic model. We do have we do see people doing something is called distillation of models.

Duncan Curtis

So often when you see large language models come out, you hear like, oh, you know, ChatGPT 4.1 is just landed on Monday. But they also released smaller versions. And the reason they release smaller versions is they take that big model and they eliminate certain parts of it to try to, to get that model smaller, but to try to retain as much of that bigger information.

Duncan Curtis

And the reason you want it smaller is because you want it to be faster. So when you're actually more real time chat, but you also want it to be cheaper to run and be able to run more on, on cheaper hardware. So that's I hope that answers your question.

Richie Cotton

Okay, so if that makes a lot of sense, if you're doing something that's a little bit of the ordinary, not, generic question in some sense, then that's when you're going to want to use your own data or get some domain specific data. Maybe we can dig into some of these ways, making your model faster and cheaper as well.

Richie Cotton

But looks things, quite useful as well. So yeah, we'll get into that, later, perhaps. All right. So I would love some concrete examples, giving you real examples of where a company's trained or they've made use of data enrichment to improve their own model and makes things better.

Duncan Curtis

Yeah, absolutely. So a great example would be, we were working with a company in the, air space. And so one of the things they wanted to do was train a model to be able to understand what was in job applicants, résumés. So they had some matching that they wanted to do where they had, let's call it, I think was somewhere in there, like 180 sort of skills that they wanted to tie to people's résumés.

Duncan Curtis

But in those resumes, you could call different skills a lot of things. So let's say let's say, executive presentations might be the tag that they wanted to use, but that might appear in someone's resume as anything from presented to the board, regularly present to, you know, my executive team. I might be speaks at conferences. And so what we would do in that case is enrich a set of resumes with getting those different, examples within résumés and aligning them with that taxonomy that, that list that they have.

Duncan Curtis

And then they can use that to, train an AI or to, to basically find new instances of that moving for.

Richie Cotton

Okay. So this is one of those high risk use cases. I was certainly under the EU AI act. Any kind of HR like I use cases is considered high risk because the risk of discrimination happening that way. We've got to be really, really, sure that the AI is highly accurate, and.

Duncan Curtis

It's not just highly accurate. I completely agree. It's also not biased. I mean, we've seen things in the past where I think it was there was some very public ones where, it turned out that I had been trained on what previous good applicants looked like, and it was extremely biased to people of color. And so it was quite unfortunate that the way that that model training was designed, inherently picked up a set of bias there.

Duncan Curtis

And so one of the things that we do, not just when we're enriching data, but, as we're going through that process, is that we also work with our clients to show them the distribution of their data. So, you know, in in something like a self-driving car case, it might be as easy as saying, hey, yes, you've got a million vehicles, but you only got 200 motorcycles.

Duncan Curtis

You really, you know, let's find some more motorcycles within your data, or you can collect some more so that you can represent, motorcycle as well. Now, in the air use case, it might be high. You don't have a good representation of gender. You don't have a good representation of educational background. Are we biasing towards particular schools because we're seeing this ad, you know, is your training data set up in that way?

Duncan Curtis

So we can we help with that? As well for our clients?

Richie Cotton

Yeah, certainly. Even just crunching some numbers. What are the distribution of the outputs that's going to give you a lot of transparency on what's model doing. Are you being fair in general? If you're making your own model or at least modifying your own AI does that. It sounds like it's quite an intensive process. So I'm wondering, is this just something for big businesses or can anyone do this?

Richie Cotton

Maybe. Wendy, can you talk me through which industries or business types are making use of this?

Wendy Gonzalez

Yeah. So I mean, for a lot of enterprises, they're leveraging these foundation models. It doesn't make sense to build your own. It's, you know, billions of dollars, lots of GPUs, lots of cloud computing. It makes no sense. They're really advanced. You can find a way to apply them to your own use cases. And as Duncan said, do things like rag embeddings, which allow you to train the model on, you know, your own proprietary data for the right application.

Wendy Gonzalez

So as a lot, that's happening there. But then in some cases your, your business, you know, maybe so quarter your business. So some easy examples are self-driving cars. It is core to the hardware in your vehicle the cameras to centers in the placement. So a lot of companies are building their own. Or you know they might be leveraging for example like Nvidia's platform, but they're still totally taking the time to build it because it's part of the hardware integration.

Wendy Gonzalez

Some cases. Yep, companies are still building it. Or another example might be things like e-commerce, right. So if you're going online, your product catalog is key. The ability to search for products and to do some of the data enrichment that Duncan had just mentioned, that your products are searchable and you can recommend things appropriately. That's an another example to where it's so core to the business of actually selling the product that, even if you might be leveraging some models, a lot of companies are choosing to to make those investments themselves as well.

Duncan Curtis

I think a good example there would be, say, ThredUp. So recently we were at, a conference called Shop Talk and there was a great presentation by a company called ThredUp who, basically allows, secondhand clothes to be sold. So their product catalog for each item is one. Each item is a unique skew. Yes. They do have the categories, but I think it was, when it was a Kendrick Lamar ad, the Super Bowl ended up with flared jeans and it was like, oh my God, this is amazing.

Duncan Curtis

And a new trend again. But they they're using AI in a really interesting way because they and their problem is different to what you're mentioning in really large businesses where they have a unique issue, where each item is being uploaded in its custom. And so looking at how they can match those and having an efficient system of data enrichment as well as AI to new to help with that data enrichment as a first pause is really important for that, product and user experience straight out of the kit.

Richie Cotton

Okay. So those are quite wildly different use cases. Certainly I can see how a self-driving car, you don't want to be sending your video data to ChatGPT and saying, well, Joe, turn now. You ask me, terrible thing. So, you want to build your own solution, but, the retail example also. Yeah, I can see how you got those different products.

Richie Cotton

Need to have, I guess, descriptions for everything. You could have photos for everything, you know, and, assistance. That's just managing or generating content. I'd like to know a bit more about what data? Which involves, because I guess in my mind is like the sort of Amazon Turk type situations where you got a lot of people just staring the photos and writing descriptions of them.

Richie Cotton

How accurate is that?

Wendy Gonzalez

Oh, data annotation. So data annotation, and enrichment, it's really about tagging basically, or enhancing unstructured data. So it could be visual data. It could be text data. It's about adding additional attributes. And the reason why that's important is number one, you either are training a model to perform in some ways on self-driving cars. Pretty clear. You need to recognize lanes.

Wendy Gonzalez

You need to recognize traffic signs. You need to recognize pedestrians and other vehicles to operate the to do things like collision avoidance. It also can be used to validate that your model is performing properly. Okay, so you need to know what does good look like. How is your model performing against what the expectations are. And so that's really what data annotation is about.

Wendy Gonzalez

And oftentimes in a number of cases it's not just about, capturing the right sort of class or the right, descriptions. It can be enriching that data. So I think a good example of this, just to kind of take the parallel from the retail example, is think of like Spotify or think of like Apple Music. Right?

Wendy Gonzalez

So they had to basically build a genome of what is music. And there are a lot of attributes, right? I mean, this is where I suddenly realize when we're talking about all the different descriptions of like, you know, k punk, you know, I wouldn't even know what k punk was like ten years, you know, like two years ago, probably as an example.

Wendy Gonzalez

But there are so many different genres. And then you from the song itself, the, you know, the, the AI needs to say, hey, where does it fit? And it's literally GM. I mean, there may be like 50 or 100 different attributes that can make something be a k punk versus a k metal versus a K-pop type of song.

Wendy Gonzalez

Right? Product catalogs run very similarly, different products, lots of attributes. So when it comes to, being able to, to search, you know, recommend, identify what to allow the AI to really perform as is optimal as possible means normally tag it. But oftentimes you need to enrich it.

Richie Cotton

Okay. Yeah. So, it's not just writing descriptions. It's about, I guess, feature engineering stuff. Yeah, certainly with music. Okay. So you got the people who were in the band, and I guess the instruments they're playing, and there's probably some kind of chord progressions and tempos and, yeah, this could very easily become a very complex modeling situation, just have subtle differences between genres.

Duncan Curtis

And it's interesting because, like when the reason you want to have this depth of attributes of a genome as, as when he was pointing out is because it allows that search relevance or that recommendation, the end use case to be so much more personalized. Whereas if you just said, you know, if you just started with, oh, it's music and it's K-pop and that's it, and that's all you had about it, it's you don't it makes it a lot harder for that next recommendation of a really great song to find his way to a listener or for a really great product.

Duncan Curtis

There's going to fit a need for someone, or for products to be matched near near each other or shown together. It really powers a lot of that ROI. I'm going to say, for the user experience where getting good ads is like, oh well, good recommendations is such a better experience than getting just something random or something near.

Duncan Curtis

What you are.

Wendy Gonzalez

Looking for, you know? And bottom line, it increases, you know, cart sizes and purchases online. So it's core core part of it. And that's really where we typically see AI being implemented is if it's going to make a core differentiate your product or it's going to help you save money.

Richie Cotton

Oh, absolutely. I have to say, a lot of the streaming music websites do this incredibly well, but for many retail websites, absolutely terrible. So the experience of searching something out, well, I know what I want. I can't see it in the results. So yeah, I can see this is incredibly important. And a lot of shopping websites really need to take note.

Richie Cotton

So hopefully anyone in the audience working on a website. Yep, better search is a good thing. Suppose you're interested in this. I guess to start you need to find yourself a data set. So what does the data come from? If you decide you want to get some data to feed into your, model, what do you do?

Duncan Curtis

Data can come from so many places. So let's say, if we're as taking a retail example, a lot of that information is going to be already there. So it's the product catalog of all the images and descriptions that your vendors have uploaded. And so you've now got access to that data and you can enrich it from there.

Duncan Curtis

If we're talking about something like self-driving cars, you need to actually have a fleet of vehicles. You need to put cameras and lighthouses on them and drive them around and actually record data. But there's also that. So that's for the raw data, collection. But I think we should also touch a little bit on, what kind of mix of synthetic data can also be part of your AI solution?

Duncan Curtis

When we look at how models are being developed, especially even the large language models, we're now seeing synthetic data from the generation before being a large portion of the training data for the next generation. Because as the data needs are exploding in terms of they're running out of the internet to continue to scrape, and so they're generating more, more content, more specific content from, you know, if you take, for example, with llama for llama three actually produced a large portion of the training data for llama for.

Duncan Curtis

And so you can see and if I, if we go to self-driving cars, for example, being able to generate, using top of the lion game engines or custom like if you think of if you think of the movies and video games, the top of the line ones we have today are photorealistic, and so they're able to use those technologies to also simulate, more hours of driving and see how the vehicle, can react.

Duncan Curtis

They're same with your product catalog is that you might have you might be going into a new category that you do not actually have information on. You can actually create synthetic data from your image generation as well as product catalog description. You can actually build a synthetic data pipeline, in order to be able to have new training data at, let's say, ahead of a new category that you're going to have on your website.

Duncan Curtis

And so the funny thing about synthetic data, though, is that you have to be careful, like what we were talking about before is taking inherent bias into your model. And so it's really important to have your synthetic data validated. When do you want to talk a little bit about. So I was there.

Wendy Gonzalez

Yeah, I was I was going to actually just touch on the, the exact same thing that you can leverage a model train, a model with a model training model could carry forward biases from the previous model. So at the end of the day, you still have to know what it looks like. You still have to know how you evaluate your model.

Wendy Gonzalez

And even if you're leveraging something like synthetic data, is it in context, you know, is it realistic? Is it achieving? Say, number one, you have to say, hey, what is it I'm trying to build in terms of my data set, what's missing? What can synthetic data support then. Second, is that synthetic data going to be of the quality that we expect it to be.

Wendy Gonzalez

So what happens oftentimes especially in like image generation, you might like have a phone, but maybe it's floating exactly one centimeter above the table right there are these things that, you know, it is not perfect. And at the end of the day, the quality of the data is what matters most, whether it is captured or synthetic or generated by a model.

Wendy Gonzalez

You still need to know how do I evaluate it? Is it accurate?

Richie Cotton

Okay, that makes a lot of sense. So if there are cases where you don't have good real data or is very expensive to collect, then get I think, that's going to be incredibly useful. I heard a case a while ago, it was from the US Air Force, where they were trying to train their drones to do something.

Richie Cotton

Is it will we only ever fly drones during the day? So we have data on how well they perform at night. So we had to deep fake nighttime flying in order to generate a data set that they could then train on. Yeah, I like that idea. Certainly. I can see how if you're generating garbage quality data from a model feeding that into another model, you could got worse and worse and worse.

Richie Cotton

And that's not going to benefit annual. Yeah.

Wendy Gonzalez

It's like a copy of a copy kind of a scenario. And we see it, synthetic data has, has, has good, good uses or great uses in edge cases. So imagine like nobody wants to real time capture, you know, a stroller in the street or an empty stroller, a straw with a baby in it. Or I heard I actually from a from an audio company client where they're like, you wouldn't think of this, but sometimes hog's like drop out of the sky and get in front of vehicles.

Wendy Gonzalez

That's probably not something you're ever going to catch. It has happened before. It's probably not something you're gonna catch, you know, in any regular real life. So it's a great example, some edge cases where it's like, okay, synthetic data makes total sense.

Richie Cotton

Okay. That feels like a very Texas scenario somehow. Yeah. I can say I can be very hard to create that kind of real, that data in real life. But, yeah, you might need to deal with it at some point. So it's important to have a data set. So suppose you've got your data, some of it synthetic, some of it's real.

Richie Cotton

What next? I guess you have to choose a model to enrich.

Duncan Curtis

Yeah, there's some some really good benchmarks out there. But one of the really nice things about where we're at today is that it's actually very easy to just try a model out. And so being able to pick the right model for your use case can be as simple as actually running through a bunch of tasks or of the type you want, and seeing what the performance of those based models are here.

Duncan Curtis

For example, Claude is extremely good at language, writing emails and writing personalized responses. It gets a really it far outperforms others, even ignoring the benchmarking. It's even more so than the benchmarking would, would, would show, for example. And so what it really comes down to is to test out the existing models for your use case and see how it feels, because at the end of the day, this is a almost all of these come down to a customer experience.

Duncan Curtis

For a lot of these. And so being able to get those intangible human elements of that sounds right. That sounds better, that's not messing up in this way. Also set up a, before you begin, you should also set up how you want to assess these. And so if you're doing a like an image detection model and you want to do start with something like YOLO off the shelf, great.

Duncan Curtis

You can also you can start with your metrics being like oh okay. Well I care about do I care more about accurately detecting everything all the time? And I'm okay with detecting some things that are the wrong things. Or is this case actually it's really more it's okay if I miss some things, but I should only detect the thing that is the problem that I'm looking for.

Duncan Curtis

So you can set up those, metrics for yourself before you begin your investigation.

Wendy Gonzalez

Yeah, I like to just to tack on to that, one thing I share often as people are kind of conflating which model should I use as models change all the time. Okay. Fair enough. I've just came out right. And there's gonna to be something that comes out in another two weeks and probably an open source version of it.

Wendy Gonzalez

Right. Or an open weight version of that, I should say it'll happen. And the one constant, though is model evaluation. You have to know, what are you trying to achieve with your model? What are the right outputs that I'm expecting? What are the appropriate parameters? And then the, you know, the models are like widgets. They can come and go.

Wendy Gonzalez

And what we're seeing actually a lot more is that companies are looking for flexibility. They know that there may be an expert angler just coming out. So how do I ensure that, you know, I have the appropriate sort of, you know, forward and backwards compatibility and an ability to, to swap something out if something better comes along, which inevitably, well, probably already has today.

Wendy Gonzalez

You know, they're coming out like every single day. Literally.

Richie Cotton

That's useful to know that need to bear in mind that you're probably going to, swap out your model at some point, probably fairly regularly, but there's so many models around, it's relatively easy to try different things. I guess the trick is just, make sure you've got something like some measure of how good it is. So if you swap it out, you know, can make things worse accidentally, you can have a way of measuring, is this going to be better on whatever the use cases?

Wendy Gonzalez

Yes, yes. Especially as, you know, models get more because all about cost right. So certain models can be cheaper to run as the technology improves. That's going to be a constant evaluation. Just like people think of like cloud optimization, you know, similar kind of contact is it's always going to be evaluated for for the ROI on the cost.

Richie Cotton

Okay. Yeah. And I guess we're mostly thing, but the inference cost, is that they're primary expenses.

Duncan Curtis

Yeah. And that'll be your primary your primary cost there, which is as I was saying earlier, one of the reasons people choose smaller models. And for example, you might need a larger model like you. You might be up in the several hundred billion parameters at the moment, like with your use case. And then something like a deep seat comes along or, you know, 2 or 3 generations later, you can actually get away with a 10 billion parameter because it's more specific or it's more your, and so you could really drop your inference costs over time.

Richie Cotton

I guess if you have more specific data, it's also going to allow you to use a smaller model compared to having more general purpose one. Like, do you need a larger general purpose model compared to a smaller, more targeted model to get the same accuracy?

Duncan Curtis

That's a really good question. I think the size of model generally relates to the capability, the general capabilities you need. So if you just need it to kind of know how to how to talk and how to think about the world, okay, you can go smaller. And the other side of that is when you said is your data really specific?

Duncan Curtis

I would look at it more as is the data you have or can create or, you know, as we talked about before, does it really encompass the the entirety of your problem? So if I took took an H.R example, is that maybe I'm doing that resumé résumé piece where I'm just matching keywords to a taxonomy, but do I need my model to do more about understanding the, the maybe the history of someone?

Duncan Curtis

Or I needed to understand all the different educational institutions in the world? Is that really captured within the data set that I've got, or how much my relying on the LM or the AI model to have that general understanding there. So I would say the more complete your data, your data is for your problem. Another way you can also go generally smaller.

Richie Cotton

Okay. So it sounds like it's really something a lot of people might fall over because there's very subtle differences and questions. What university did this person go to is very different from is this a good university? Was it a good university at this particular time.

Duncan Curtis

At that time? Yeah. I mean wow. And it's something that you could create or have that data available as part of the models of what the model has access to. Or you could be like, hey, and do you want your model going and checking the internet every time that is doing a, you know, a check here, that inference cost can go much higher.

Duncan Curtis

If you need to check each university, each résumé that you're looking for, to go figure that out at the same time, which is interesting to think about the cost.

Richie Cotton

Absolutely. Okay. So I guess the last step is how do you combine all your own custom data with the LLN? So, you know, there are a few techniques for this something. The simplest thing is just to use retrieval augmented generation back. And there are fine tuning options and a bunch of other techniques. So once you choose each of the different options to combine your data with the LM or what, I.

Duncan Curtis

Yeah. So I mean with things more like, like with simpler models like say, if you're using a vision detection from like YOLO or something like that, it can be very cheap for you to actually add that data and do a retraining or a fine tuning of the model. But when we're getting up into that LM space is that it's fine tuning, is probably not the right time.

Duncan Curtis

As people think about it more from the sort of like traditional, like CVS, you know, vision models, it's much more about how do you give the model the right context and the right access to that right data. So more in that sort of rag kind of approach where you're like, okay, here's my document storage. And even for some use cases, it can even with the larger context windows we're seeing now, we're actually seeing some clients that are able to incorporate the data that the model needs into the into the prompt at inference time, depending once again, depending on how large the data set size is.

Duncan Curtis

So, you know, those are the two common ways that we've been seeing for smaller problems that that only require like a few documents worth for context versus ones where you're like, you know, I've got millions of images that need to be or millions of documents that need to be included, and, or are regularly changing. That's one of the other advantages of a Rag system is not having a fixed like it's stopped at a point in time.

Duncan Curtis

You can update the data sources as, as the world evolves and you can keep your model, continuing to perform there.

Wendy Gonzalez

Yeah, we, we see I mean, rag embeddings is definitely, I think the predominant the predominant approach. I just I say that because supervised fine tuning for one model change all the time. Right. So as soon as you choose to fine tune a model, it kind of limits your ability to transition to another model. The second component is that it takes a decent math expertise.

Wendy Gonzalez

Right. And a lot of these models are, you know, they're open late or they're they're closed, right. You don't really know what's happening in the background to really understand those models and have the expertise to fine tune it, in addition to the fact that, it makes a little bit harder to swap out, or you might be losing some sort of ROI in terms of being able to transition to another model.

Wendy Gonzalez

We're seeing that rag and prompts, like Duncan was saying, is really the most predominant way most companies are thinking about it now.

Richie Cotton

So going back to your ThredUp example, where anytime someone uploads a new clothing item, you don't have to completely retune your model. So you say, okay, I just want to add it to a data store and then that's done. Yeah.

Wendy Gonzalez

Let's pull it in. Exactly.

Richie Cotton

All right. Cool. So, from that I can see the big thing is how much can use leave these models to, to go and do stuff on their own. So when do you need a human involved? So what can you automate and what can't you automate in these situations?

Duncan Curtis

I think, automation is a really interesting topic. I think that automation should always be you should always ask yourself the question, what can I automate and start with? Automation is your first place? Well, you then should be asking yourself is where do I definitely need humans involved here? Whether that's on the data enrichment side, whether that's checking the synthetic data quality, whether it's also in an ongoing manner.

Duncan Curtis

So one of the things that we found you kind of mentioned that how much can you just let them go? And with the rate of change, that we see in the world and the data sets for people, is that having a, an ongoing model evaluation or validation in an ongoing, nature is a really great way to be able to continue to keep your model performing at a very high rate.

Duncan Curtis

Let's let's take a trade up as an example, is that if as fashion and maybe their models are performing extremely well now, but, maybe they weren't actually that great at the start. The type of jeans again, they were the the.

Wendy Gonzalez

Flare jeans.

Duncan Curtis

Layer jeans maybe for jeans actually wasn't something their model was actually performing particularly well at, but now it's become a really popular, thing because of, because of that Super Bowl, Hassan song. And so what you could detect on an ongoing basis is, hey, what are the most popular categories people are looking for, searching for?

Duncan Curtis

And are we doing well at that data? And so having a set of humans who can actually, on a regular basis, plug into how work you to your use case and how it's evolving and check that and see that the models performing and then being able to do something with that response, like actually it's not performing really well.

Duncan Curtis

Okay, well let's make sure that we go look at some of the data for for flare jeans and actually go and enrich that. And right now the models back up to performing. So we see it more as an ongoing process, because once you've found that business value, that sort of like, hey, this is actually really valuable to customers.

Duncan Curtis

It's, you know, increasing my cart size. It's, you know, loading cart abandonment. What whatever business value you're gaining there, you've got to ask yourself for each percentage of performance in that model, what what's it going to cost me if I'm willing to accept 5% less, 10% less, where 1% like, does each percent mean, you know, $1 million a year and different or $10 million depending on the scale of the company.

Duncan Curtis

And so then you can offset that with, keeping having that human a model validation. Okay.

Wendy Gonzalez

Yeah. Yeah. Keeping I mean, that's really the key is these models, they're experiential as well. Not only do they learn, you know, there are new edge cases, new data that comes up like flare jeans or like whatever the, you know, the latest and greatest current trend or event is, but beyond that, it learn something from them.

Wendy Gonzalez

You need to make sure it doesn't go off piste. The model evaluation is really important because data is constantly changing and you need to know is it still performing as we expected it to perform? Another example to think of a simple one is like a self-driving car. You may have a car that works really, really well. And you know, Pacific Highway 101 here in California.

Wendy Gonzalez

But you take that and you put it into, you know, Rio de Janeiro, where you also have coastline and, you know, roads. It may be it doesn't do so well. And plus it doesn't really understand Portuguese or doesn't understand these types of the cars. So there is going to be edge case data that's going to be important. And given that these technologies are global, like that's kind of the the trick.

Wendy Gonzalez

So like an agriculture example that is you know, like wheat in Kenya looks different than wheat in Russia. You can't just assume that the model that's been trained right is knows what the differences are. And that's the thing about these models is that, yes, they're they're trained on the entire corpus of the internet. But as they believe the internet is either unbiased or has every single piece of data on it for every application.

Wendy Gonzalez

All right. So that's the other way to think of it.

Duncan Curtis

I'd also mentioned many that we think about automation in the same way as well. While we're focusing on that human, validation element. We actually we have automation that's built into our processes as well. So if we, if we take that, trade up example is that if we were doing data validation and saying, hey, do we have the right category for, for these, this new product item is that we're not just going to sit there and have someone go, okay, let me look at the entire list of 10,000 options for where for what category this could be.

Duncan Curtis

Know we will introduce it. We have an hour limit the beginning saying, hey, I think it's one of these three categories. And that allows us to capture that human insight in the most efficient way possible. And so that's how we really think about, that sort of human in the loop in automation.

Richie Cotton

Okay. That certainly makes a lot of sense if you've got, severe consequences. When, the AI makes a mistake, then you can lead to it. But you have, both things. So you need to have, software monitoring tool that says, okay, the AI is done something wrong, and then you're going to have to have humans, and as well, just to make a decision on what to do next, verify that there has been a problem with the AI.

Richie Cotton

I'd like to know a bit about how you go about implementing these things. So suppose you say, okay, we need to have our own custom systems. We're going to use our own data in, in AI. First of all, who tends to be in charge of these projects using the what? Chief technology officer, chief AI office, chief data officer.

Richie Cotton

Who tends to run this sort of thing?

Duncan Curtis

Yeah, it generally comes from the top down in terms of how the need for the business to say, like, hey, we need to be further ahead in AI. It's a very common thing that we're hearing, and I'm sure a lot of the audience are like, my boss or my board is telling me, like, why you're not using more AI.

Duncan Curtis

That's certainly where it initially comes from. But in terms of those running the projects, it goes down the chain where you do. We do see a much more prevalent, set of like our chief AI officer as a much more common title. Now we're seeing that it then goes down to sort of like your directors of engineering or director of product, who may have, a particular surface area or part of a product or service that they, they, they manage.

Duncan Curtis

And that's when the project starts becoming more real, because you've actually got people who are, you know, partially, if not directly responsible for the pmml of a particular area. And they're like, right, okay, I know what I need to move from a business perspective. I have a target area because I know that, you know, let's say was fed up, for example, it could be something where you've got, okay, well, I'm in the, recommendations team.

Duncan Curtis

Okay. Well, I know that our recommendations are performing like this, and I needed to perform it at a different level. Okay, now I can get in and start saying, well, let's let's try to address this with AI. What kind of model? What kind of data are we going to need? Oh, wait, I need human enrichment. I should probably, you know, reach out to some,

Wendy Gonzalez

And, and with the fact that make AI easier to adopt, we're seeing also that a bit more on the business side. So it's not unusual for a head of customer experience to say, hey, how am I going to integrate, you know, chatbot omnichannel experience, right. So before you might think he's reaching out.

Richie Cotton

Yeah, I think it's very true that stuff like this tend to start right at the top of the business. So yeah, there's always that I see you going, yes, let's have more AI. And then it has to filter down to all the different departments. Yeah, a lot of different departments end up having to contribute in some way.

Richie Cotton

I'd like to talk a little bit about summer itself. I know we have because status. So, just the international audience. Can you explain what is there be core and why did you choose this approach? When did you want to take this one?

Wendy Gonzalez

Yeah, yeah. So a B Corp stands for, Benefit Corporation. So as Sam, we are a public benefit corporation and is basically a designation, that states that a company has both a, a, profit purpose. Right. But also a social, or environmental purpose as well. So it means that you can do it's the double bottom line or the triple bottom line, B Corp A and the B Corp certification group is basically the best, globally known standard for how you about evaluate companies who have purpose and profit.

Wendy Gonzalez

And so we are B Corp, but they have a really, rigorous evaluation process that is based off of the United Nations Sustainable Development Goals. And, in our case, we are really focused on, on, workers and impact. So part of our, our, our core social mission is to bring people from underserved communities into the digital economy by providing them not only training, but full time employment.

Wendy Gonzalez

Not on the basis of just their, you know, their work experience or their resume, but on the basis of their impact. And then we provide the the upskilling and training.

Richie Cotton

So actually they've done pretty. But I do like they have social Mission. So, can you describe some of the positive impacts you've had so far?

Wendy Gonzalez

Yeah, definitely. We're super proud of this. So we've moved, almost 70,000 people out of poverty since we have started, this, this company. And, what we do is we have an impact hiring model where we hire on the basis of impact. So that means household incomes that fall below the world Bank poverty standard. Really, the key here is that there's so many talented people, lots of just incredibly talented people.

Wendy Gonzalez

But they lack the opportunity. And so some means equal in Sanskrit. We are really just trying to level the playing field by providing job opportunities to people who would have huge barriers otherwise, but they're still really talented. So we open the door. But our great and talented workforce is the ones who walk through it.

Richie Cotton

That's very cool. And John, it describes some of the jobs that these people are doing.

Wendy Gonzalez

Yes. So we're doing a lot in the data annotation validation space, data insights. But I'll, I'll give a couple of examples. So we have people who are helping, do annotation for self-driving cars. So that can be, you know, complex, sensor fusion 3D lidar into the data to help detect, you know, vulnerable road users or traffic signs.

Wendy Gonzalez

And it sounds, fairly simple, but it's not. I would actually challenge anybody to kind of run some of these. It requires no depth perception, like lots of, you know, complex taxonomy. In other cases we're doing validation. So an example that Duncan was using earlier is, you are building, you know, you you're, you have a retail application, you have the images.

Wendy Gonzalez

But the images that were provided by your, you know, individual in a small business don't include any descriptions. And lln auto generates the description, but it's written out of context or has weird grammar or is not exactly accurate. It's true. So instead of green or, you know, examples like that. And so the human loop might be there to help validate the, the reasoning, the context, the accuracy.

Wendy Gonzalez

And then they're also doing things really intelligently. So it's not just about okay, let's annotate this or annotate this edge case, but it could be the evaluation of the data set. So Duncan's example of, hey, they're not enough motorcycles, or we've noticed that there's a really common failure point with your model. It always fails when it is, you know, a red vehicle and making that up.

Wendy Gonzalez

But the idea is that they're evaluating understanding of the data, then providing insights back to our clients on how they can improve the accuracy of their models.

Richie Cotton

Yeah. So I do feel like these are the unsung heroes of AI. So because you've got all these rockstar AI researchers earning a fortune, but then I say in terms of model quality, just getting correctly annotated data in the background is incredibly important. Probably has an equal impact, I think.

Wendy Gonzalez

Absolutely. I mean, it is still true to this data. It really is garbage in, garbage out. Like data is king. You have to make sure using the right data and that data is valid. And that's that's the entire goal behind the social mission is there's a lot of business value that's being created. Want to create a lot of employment and living wages along the way.

Wendy Gonzalez

And I'll let, a larger group of people participate. And yeah, the the effects of the digital economy.

Richie Cotton

All right. Wonderful. So actually, I have a tricky question, because this is one issue we tend to skirt around a lot when we talk about AI. But since you're running a socially conscious company, I wanted to ask you about it. A lot of AI, particularly journey of AI, is incredibly energy intensive. So this is trade off between do we go all in on AI and kind of burn the world, or how do you deal with that?

Richie Cotton

Do you talk me through how you think about these energy problems at some.

Wendy Gonzalez

Yeah, it's really interesting. We, we we also are actually part of the science space climate action. So we have a, we have a sustainable objective as well. And we're carbon neutral. And all of our North American offices and we are working towards that with our, locations in East Africa as well. And, and tracking and measuring that to it is really, interesting, as you can see, the largest, tech companies, right?

Wendy Gonzalez

The hyperscalers are literally doing things like buying Three Mile Island and building data centers and billions. And we had some pretty interesting, data points that, you know, even the, the average, price of, you know, of utilities and energy is actually affected the average person in Ireland because there is so much energy being used because so many tech companies are based, in that area.

Wendy Gonzalez

So I think it's an under discussed topic. That is, something that, deserves deserves more because at the end of the day, the more sort of commoditized and lower the cost, the AI is going to, you know, it takes to be to be built. Right. So, you know, the GPUs are going to get cheaper, they're going to get smarter.

Wendy Gonzalez

And yes, I'll use less energy. But what happens on the flip side is that adoption goes up. What is the first thing Amazon says? They said, we're going to, you know, be building more in terms of cloud computing and we need more energy, right? Because a lot more people and at the leveraging AI. So I, I don't know that I have a good like, hey, what what do we do to, manage this challenge?

Wendy Gonzalez

I think it is something, though, that needs to be really raised because at the end of the day, a lot of people believe, and I believe too, that, you know, economies are really driven by how how intelligent people are. And AI, right, how these countries are AI. If you don't have a policy review on how you're going to be leveraging it, it will, you know, it will and is contributing to the, you know, climate crisis we already have.

Richie Cotton

Yeah. I kind of agree with you on this. There's probably no easy solutions at all. But I do like the idea that legislators, need to start thinking about this. And I guess any companies to using this also need to think about what they're doing. Yeah.

Wendy Gonzalez

And some are you know, I think some come, you know, some some have really clearly stated the kind of net zero approaches. And so I think, you know, it's especially in this day and age, it's I think we're pretty hard pressed to see regulation being at least here in the US a a top priority. But I do think the, you know, the notion of both standards and, you know, talking about it is going to be important because in each company can choose to take their own, you know, actions towards this.

Wendy Gonzalez

So there was a lot of really your here's a challenge, I should say, with not having any policy or regulation is we saw the leaps forward in green technology take place when California set some really important emission standards. Right? And then all of a sudden everybody's motivated to think really intelligently about green technology. I think there is a role for for policy to be to, to play in this, whether that will happen, you know, the next few years here, probably maybe not, maybe not here in the US.

Wendy Gonzalez

But I think it's a very important consideration. And we're seeing some companies lead the way. They're really challenging, to get net zero and challenging their suppliers like us to get net zero.

Richie Cotton

Yeah. I like the idea that, just one part of the world needs to come up with some regulations, and then it provides inspiration for other parts of the world. But, yeah, maybe this feels like a whole separate episode.

Wendy Gonzalez

It probably is.

Richie Cotton

To, Yeah, the audience has to watch this space. We'll see what we can do. All right. So, just to wrap up, do you have any final advice for companies who are interested in doing data enrichment?

Wendy Gonzalez

One of the things we haven't, discussed, but I think we've been discussing components of it, is this notion of responsible AI and responsible AI. It isn't the same thing as like ethical AI. What it means is it's really have you build the AI in a responsible way. And it's typically around the four different pillars. It's around data governance.

Wendy Gonzalez

So what data are using where do you get it whether you have the right to use it or not. All that's coming into question I think with the bit reading going on right now, but still, do you have the right data? Do you have the rights or privacy and security? Do you the way in which you can evaluate your model, how do you evaluate it?

Wendy Gonzalez

Right. Some high risk applications require a different level of scrutiny, maybe some human, insight, etc.. So lending systems, safety applications, you know, etc. defense, you know, cyber security, all those kinds of things. And so at the end of the day, developers are the ones who build this. And so I'm definitely an advocate of making sure that those pillars of responsible AI, which are really just developed and best practices are really communicated because at the end of the day, it's the developer who he's picking that model, who's building that model that needs to be aware of these practices so that you can really, sort of measure twice, cut once.

Wendy Gonzalez

That makes sense. Have a plan, have a way to evaluate it or be able to do so knowing that you're going to get to the right performance outcome. If you leverage some of these best practices out there, you can make sure that you know your your bot or whatever it is you're building, you know, gets a red team, right?

Wendy Gonzalez

And doesn't say or do something inappropriate, that there are ways in which there you can have some appropriate best practices into your development, much like you would for security, right? You don't want to leak your customers data, so you're going to build an architecture with security in mind. I think of responsible, AI development practices as the same thing.

Duncan Curtis

I'd add one more thing. I love that Wendy is, especially for potentially some of your audience, depending on how far along they are in their AI journey, is that when you're thinking about this is do some research or talk to some experts like ourselves that doesn't have to just be a plug for us. But really, there's really good resources out there.

Duncan Curtis

There are people in the industry, people like ourselves who can help, who we've seen this like we know what good looks like. We've been there. We've done that for for extremely large names, little named companies. And we've seen what helps them drive success is that there's help out there. You don't have to figure it all out by yourself.

Richie Cotton

Yeah. Certainly both great ideas there. I like the idea of responsible AI. You don't, expose yourself to business risks by doing something stupid. And again, simply open Duncan as like, yeah, you don't want to do something stupid that talk to an expert. Have a think about what you do before you dive straight in the.

Richie Cotton

Which is shame. I always say funny. I always want to dive in, but, yeah, it's much more sensible to think first and then ask for help if you need it. So finally, I always want recommendations for people to follow. So I love to know about whose work you admire at the moment. Who are you listening to?

Wendy Gonzalez

This has nothing to do with AI, but I've been listening to Michelle Obama a lot. That's nothing to do. Hey, I brought up the first thing. First thing that came to mind. Okay. And.

Richie Cotton

Okay, I actually thought it was going to be a capable recommendation.

Duncan Curtis

I mean, I haven't been following individuals as much, but I've been following trends, and some of the things that I find really interesting, is the introduction of, let's call it more standard engineering practices as it applies to LM. So we saw this with DPC, as it came out, where they took an approach of different innovative, not even necessarily super innovative ways, but instead of making the goal, how do we make the biggest model?

Duncan Curtis

How do we make the most efficient model and seeing different ways to tackle that problem? That's been super fascinating to me, as I know it out in that space. So.

Wendy Gonzalez

And I have a serious answer, actually, I Michelle's the first one that came to mind, but, Andre's Karpathy, he does a great set of series. I think anybody who is really trying to understand more about AI at the technical concepts of AI, he is I would subscribe to his YouTube channel.

Duncan Curtis

It's excellent.

Richie Cotton

Yeah. Both Michelle Obama and Andrej Karpathy definitely great speakers, although, very different topics and yeah, there a lot of, amazing foundation model researchers, speaking about their work as well. And I do like the idea of making things, more efficient, saving money. But, on that note, I think we have to, stop now.

Richie Cotton

So. Yeah. Thank you.

Duncan Curtis

Both. Thanks, Richie.

Topics

Artificial Intelligence

Generative AI

blog

Top 20 LLM Guardrails With Examples

Learn about the 20 essential LLM guardrails that ensure the safe, ethical, and responsible use of AI language models.

Bhavishya Pandit

8 min

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.

podcast

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.

podcast

Guardrails for the Future of AI with Viktor Mayer-Schönberger, Professor of Internet Governance and Regulation at the University of Oxford

Richie and Viktor explore the definition of guardrails, characteristics of good guardrails, life-or-death decision-making, decision-making and cognitive bias, AI and the implementation of guardrails, and much more.

podcast

Reviewing Our Data Trends & Predictions of 2024 with DataCamp's CEO & COO, Jonathan Cornelissen & Martijn Theuwissen

Richie, Jonathan, and Martijn review the mainstream adoption of GenAI, the rise of AI literacy as a critical skill, the emergence of AI engineers, evolving trends in programming languages, why AI hype continues to thrive and much more.

podcast

The Past, Present & Future of Generative AI—With Joanne Chen, General Partner at Foundation Capital

Richie and Joanne cover emerging trends in generative AI, business use cases, the role of AI in augmenting work, and actionable insights for individuals and organizations wanting to adopt AI.

See More See More