Adel Nehme, the host of DataFramed, the DataCamp podcast, recently interviewed Sandra Kublik, and Shubham Saboo, authors of GPT-3: Building Innovative NLP Products using Large Language Models
Adel Nehme: Hello everyone, this is Adele- Data Science educator, and evangelist at DataCamp. One of the most exciting on inspiring developments of the past few years in data science has been the rise of large language models like GPT-3, in case you've been living under rock generative models like GPT-3 for text, DALLE 2 for images have shown the incredible potential for what an AI-powered future would look like.
Whether it's automatic image generation from prompt, sophisticated code autocomplete the possibilities are endless, and this is why I'm so excited to speak with Shubham Saboo and Sandra Kublik authors of the O’ Reilybook, GPT-3 building innovative NLP products using large language models. Throughout the episode, we talk about the rise of large language models, the underlying technology, and how it's different, why GPT 3’s API interface is revolutionary in machine learning, potential use cases, its risks and limitations, and much, much more. If you enjoyed this episode, make sure to rate and comment, but only if you enjoyed it.
Also, I wanted to let you know that this week access to DataCamp Premium and DataCamp Teams is completely free. What does this mean? It means that all you need to do is register to gain access to all of our learning content, access to DataCamp certifications, workspace, competitions, and much, much more. Make sure to take advantage of the offer with no strings attached. Click here for more details.
Now on today's episode, Subham, Sandra. It's great to have you on the show.
Sandra: Yeah. A pleasure to be here. Thank you so much for having us.
Adel Nehme: I am excited to talk to you about GT three, your book on it, and what it means for the future of AI and data science. But before we get started, can you give us a bit of a background about yourselves?
Shubham: Yeah, sure. My name is Shubham and I started as a data scientist initially.
And during my time as a data scientist, I got to work with a FinTech firm where I established the entire machine learning and data practices for the technology infrastructure, all from scratch. Then I thought of doing something for the AI community and moved into the role of AI evangelists, where I got to foster the ideas and thoughts of the community members throughout the spectrum.
And right now I'm working as a senior AI evangelist at Gina AI, which is a new search company. And around 2020, when the open AI API got released, I was among the early members who got access to the API. I was literally amazed by the things that it could do and have experimented with the API a lot, have been posting a lot about the use cases that we can do and started writing blocks on it, and that's how it all started. And that's how it all converted into an O’ Reily book where I got in touch with an O’ Reily editor and she was really excited to have something out on this topic. That's how it all got into the play.
Sandra: Yeah. I, I have. Typical background when it comes to AI, I was a liberal art major and I was always drawn to creative projects. So I used to think that I'll become an academic or a writer. I was experimenting a little bit with movies as well. And then I pivoted to the startup ecosystem because I always loved tech. I always loved how it enables us to improve our lives, to make them as frictionless as possible. So I was always in love with it and I just wanted to be closer to it. And also I felt that being an outsider at the time that the breakthroughs happening in AI are something to be observed and watched closely. And I wanted to just get involved in whatever form I could. I started with. Setting up AA hackathon community for enthusiastics. That's how deep learning labs was established.
And then it organically grew into an incubator for startups. Next Script. I also started a YouTube channel just to feed my curiosity and give myself some space to discuss and think through these AA breakthroughs that were the most appealing to me. And obviously like everybody here, I guess, was mind blown when G three was launched. I was lucky enough to also become the early tester for the beta. And yeah, I guess around the time Shubham reached out to me, that's how he found me, I think, though, through the videos. And he offered me this awesome project, to write a book about it, to learn more about it. So of course I just dived into it.
And for the past year or so, I was working for Neptune AI. And right now I want to continue on this NLP path. So I just joined. And we are also launching the book. So it's super exciting.
What is GPT3 and how is it different
Adel Nehme: So I'm excited to talk to you about GT three and your book on it. There's a line from Ernest Hemingway's. The sun also rises when one character asks another, how did you go bankrupt? And the other character responds with “two ways”. Gradually even suddenly. I'm always reminded of this when looking at a lot of the results from large language models, like GPT-3, it feels to me like an outside observer the AI community has been doing a lot of gradual improvements to NLP systems.
And it has suddenly resulted in all inspiring systems and outputs. I wanted to set the stage for today's conversation by first understanding what makes GPT3 so interesting. And how is it different from other machine learning systems that we're used to.
Shubham: So when we talk about GPT 3, it always makes sense to look at a little bit of history from where it all started. And what's the origin of all these language models. So it goes back to 2017 when transformers got introduced and it has changed the direction of NLP, how the field has been looked at, how the field will progress, and how things will work in it. So transformer was one of the revolutionary reduced of attention, which is basically looking at certain things similar to how a human does.
So an AI model, which can exactly replicate how a human brain works. That's where it all started. Then researchers at Google Open AI, Microsoft, started experimenting with how we can take this forward and make something that is usable for channel public or audience, or someone's an engineer or a data scientist, and how all of these use cases can be put to use in real-world use cases and how businesses can be formed on top of it.
And that's how the GPT series got originated. So we didn't directly lend it at G three. It was a part of the generative pre beta series. We had GPT 1, we had GPT 2. Then we had GPT 3, but what changed with GPT 3 and what makes it so exciting and what makes it so revolutionary is this was the first time that we saw that an AI model NLP model, or a language model can do a task can do any number of tasks is not limited to a specific task.
What we have conventionally seen in NLP. So just to give you some context, like machine learning model, how it works is you give it a training set for a specific task, and then you train it on the training data set, and that's how you inference it. Or that's how it performs a task, a specific task on which it is trained on.
But GPT 3, because the data it has been trained on comes from such a big universe of internet. It can literally perform any number of tasks that you can think of. And so it is the first time that we have seen a task agnostic or a task independent AI model, a truly generalized AI model that can perform any number of task.
And the other good thing about GPT 3 is the kind of is the way you interact with the model right? Previously, it was a thing that if you have to interact with model, you need to have a technical prereq site. You need to have an understanding of a programming language. You need to have data set. You need to understand how training works, how fencing works, how you can deploy it, but GPT 3 does all, all these conventional Parum and gave you a simple user interface where it is as easy as talking to a human. So you just go to the playground, you give us instructions and simple English, and the model will come up with an output it's like collaborating with a human or collaborating with a buddy or a subject matter expert.
So you can also think of a GPT 3 as a subject matter expert for a number of tasks. So that's what, uh, makes that's what makes GPT 3 so special. And, uh, I'll pass it over to Sandra to throw some light on that.
Sandra: These are great points. What I would add at the top of that was that introducing GPT 3 in the form of an API and giving a broader access to it, to developers. And as Shubham mentioned, the interface got so simplified that people without necessary necessarily heavy MDs cases. So after a short period of time, once the initial access was released, You could see all these use cases, just emerging from the community, testing it. And they were just like mind blowing, just translating legal documents into a simple language or analyzing the recipe of the product and then translating it into what are the ingredients that are harmful, what are the ingredients that are good for you? All sorts of things. Really. And I think this is, this was one of these like radically new things that you got to interact with this powerful model via a very simply designed API. And you got to actually explore different use cases at the top of it. And we talked with open API creators in the interviews for the book and they admitted themselves.
For example, Peter welling, their VP of product there. He admitted that they themselves, when releasing the API had no idea what it is capable of and they wanted to give the access to the community so that they can show them. The boundaries, the limits and explore further. And that is just a brilliant idea. And it got us where it is right now.
Adel Nehme: That's really exciting. And I wanna unpack all of these elements with you. So let's start off with the technology underlying GPT 3. I mentioned a bit here, the gradual work that the AI community has been doing to improve the technology underlying GPT 3, what are the changes that have happened over the past few years that led to these high performing results?
Sandra: Mm Shubham has already mentioned that we had this major paradigm shift in the NLP once the transformer architecture was introduced. So we start in 2017 with the famous paper. Attention is all you need where this architecture was launched. And the backbone for transformers is a sequence to sequence architecture.
Basically transformer model processes, a sequence of text all at once instead of award at the time. And also has this powerful mechanism that Sherron has mentioned called attention and transformer architecture is definitely a key thing to, to highlight. When you think about the changes that have led to the birth of G PT three, another one was that with time in the NLP space, language models, initially they weren't so big.
They weren't so big and impressive, but they started to become bigger and bigger. And the data sets that they were being trained on were becoming bigger with more and more data availability with open source projects, where researchers put together this massive data sets and was just easier to train these models.
On another thing that sort of parallelly emerged was more and more computing power at the hands of. The guys that have the computing power. And so it allowed them as well to train more and more powerful models and experiment with bigger and bigger architectures. Another one was that, okay, you have this powerful computing power, but at the same time, you want to find ways in which you use this computing power in economic efficient way so that you don't run out of it.
So to speak simp simply put, and one of the, one of the techniques used in GPT three was pre-training the model. And this basically helped to reuse all this initial training, this very lengthy process of training of the model to be applied to other use cases with a little bit of fine tuning or a little bit of tweaking to a particular use case that you had in mind.
So that was like a big one as well. Shaan also mentioned that there were iterations of G P T before we arrived at G PT three and there was G PT one, I think it was introduced in like 2018 where it had, I think like around 120 million parameters than GPT 2, 10 times bigger with a bigger dataset as well.
And then eventually G PT three, which like was a hundred times bigger and also had a hundred times heavier dataset. So they were constantly working with opportunity to have bigger datasets and also with opportunity to have more computing power. And also seeing that scaling leads to the emergence of more powerful language capabilities. And these models were actually being able to do more and more perform better at variety of tasks. They went this path and that's how we arrived at GPT 3 with. Which hit the sweet spot.
Prompt Engineering with GPT3
Adel Nehme: That's really awesome. And I'm excited to talk about the scaling aspect here of GPT 3. And where does the, where do we hit a wall when it comes to scaling? But you mentioned earlier in your chat here at Sandra, especially the API model and how transformative it is democratizing access to working with such powerful models. The API model of GPT 3 is definitely interesting. And I think it, it does introduce a paradigm shift in how we interact with complex AI systems.
Even I, someone who hasn't necessarily coded a lot in the past two years found it very intuitive to work with GPT three. How does the API model change how we interact with ML systems? And can you walk us through the concept of prompt engineering? So
Shubham: We have seen like in 2020, it was the first time somebody has introduced this API based approach and previously it was all hosted locally. You train your model, you host it. You get your own data sets and that always lets you hit up while on the task that you want to perform. So you can only reach to a certain limit or certain level of accuracy when you're training on your own data set when you're hosting on your own, because there are technology, infras, technology, limitations, infrastructure, limitations, hosting, limitations, and whatnot. But what open I decided and what I think made it revolutionary was giving GPT3's access in the form of API. It allowed people who don't understand coding as you. Mentioned, right. You haven't coded for quite some time, but when you use G P3, it is very intuitive. You don't feel like you're coding something or even you're interacting with a sophisticated language model.
So all of these things doesn't come into picture when you interact with GPT3, it's as simple as talking to someone and getting the output by giving the input input is as simple as natural language or simple English, and you get the output of whatever you want. It can be generative. Output can be search output, and it can be a number of cases, classification, and ation, all the things that is possible with conventional NP and the process of giving this kind of input and getting the outputs, which is a kind of organic or natural process, which is as close to giving an input in English is from design.
So it is an intuitive process for people without any ML expertise or ML background that they can give a textual input to model in simple English and get the desired output in whatever you want. So let's say if you want to write a paragraph on NLP, what you have to do is simply ask GPT3 , can you please write a paragraph on NLP or please write a paragraph on NLP with this?
And it comes up with a paragraph. It is as simple as that. Some tips on prompt design and engineering that you have to keep in mind while interacting with GPT3 is to understand what GPT3 knows about the world and giving the input in such a manner to leverage the knowledge of GPT3. So GPT3 is not great at giving you the factual answers. It can create it because it generates things on the fly. So it is really good when you have to complete something. When you have to create something, when you want to go creative. And when you want to put abstract things out in real reality, right? Because we have seen a lot of artists, lot of illustrators and people from design background getting attracted to GPT3 and getting, and using GPT3, because all of these people had a lot of abstract ideas going through their mind and they didn't have any idea of how to represent it or how to put it, to put that into execution.
Then they came to GPT3, they gave it as input in the form of prompt and got all those output for those abstract ideas. So it's basically like acting as a sounding board for these kind of people and making it easier for them. To understand what, what they actually want to do and helping them with their creative and design process.
Sandra: Yeah, maybe what I think it's a great take on prompt design. Maybe what I would add to some tips when it comes to interacting with the model would be that you need to realize that GPDR is like super, super good at storytelling. And it's going to continue in the same fashion as you would prompt it in. If you start with like science fiction novel with a few lines of a science fiction novel, it'll continue in the same way. If you will start. With a line that looks more like a love letter, it'll continue in the same way. It's just, it's incredible at being able to move between different styles and mimicking and continuing in the same fashion. So the most important would be to making sure that the initial input that you give it hits that checks of that requirement.
If you're going for a center, certain genre, just make sure to, to give it enough of an input so that it can continue in the same way. Yeah. And another thing would be that if you find yourself getting inconsistent messages, inconsistent outputs from GPT3, just make sure that you give it enough of a context to make it consistent.
One example that comes to my mind would be like question and answer use case where if you try to get sort of trivia style questions and answers you, you give it a question and it gives you an answer or you ask it to create both questions and answers. And without enough of a context, it might get. It might give you some answers that are non factual that are just made up because it has all this, all this data at its hand.
And it doesn't necessarily think in logical factual ways. However, if you ask it to be factual, if you say, okay, write the future of the questions with factual responses, then you're going to get the factual responses. So it's as simple as that. Just like giving it enough information of what you're trying to achieve in order to arrive at the desired output and thinking about it as if it's a one metaphor is just like talking to a friend in a bar.
And trying to be as simple and concise in your messages as possible so that the other side understands what you're going for. And then you should be good.
Adel Nehme: I love that last part, especially on the question and answer style, prompt engineering, you know, one example that I've seen, which is one Testament for the intuitiveness of the question and answer style, prompt engineering, as well as for the emergent capabilities of models like GT three is creating jeopardy style, questions and answers. I saw this example recently 5, 7, 8, 10 years ago. You would need to train a model specifically on jeopardy questions to be able to reach that level of parity, but just with a few prompts, for example, on GT three, it has been able to blow that specialized model out of, out, out of the park, right? Just through that prompt engineering. It's really interesting in a sense, because it showcases those emergent capabilities of GT three. It's not trained on that task, but it does really well just with two or three prompts.
Sandra: Yeah, exactly. As we mentioned, it's extremely good at very quickly, figuring out what you would like to achieve. And as long as you give it enough of a context, it should get there. At this point, it's like extremely generalized model that can be applied to so many language based tasks. I think there is a reason for why we are still waiting for say the next iteration, GPT 4. Let's see what will happen in the future, but at the moment G P three, it's already so usable and so appliable to different types of tasks that you can really achieve a lot with it with just a little bit of a nudge in a certain direction.
Adel Nehme: So we talked about scale and in a lot of ways, scale in the terms of the data ingested and the number of parameters of GT three has been a massive factor in why it's so good and why it's so easy to use. How important is scale as part of GT three success. And what I'm trying to get at specifically here is reaching generalized intelligence, truly a matter of time.
Sandra: First of all, when you look at opening ice mission, that what they are striving for is all, all their projects, basically, that they engage in is arriving or facilitating the development of AGI that is be benevolent and beneficial for as many people as possible. So with their experiments, with the models, they certainly are trying to arrive to as general intelligence as possible.
So with initial experiments to GPT3, the scale was extremely important. It was crucial. They were being taken aback by how much the model capabilities change. When you add scale to it, when you leave the same sort of architecture, when you leave the transformer as a backbone, but when you just make it fatter and bigger, and the iterations that followed GPT1, GPT2, GPT3, they weren't.
That's different in terms of the architecture, it stayed the same, but what they were doing was they were increasing the number of parameters, increasing the data sets. And this is how they were trying to see whether it changes and whether it gets better at certain benchmarks or general language based tasks, and it's proved to be true.
So that's why they were incentivized to go in that direction. Having said that with scale and with certain, and with scaling of computing power, there also come costs. And there also come concerns. For example, related to the environment. And as open hours was scaling their models, there were more and more research showing that we should actually be more aware, more careful of how we are using this computing power because of the economic and ecological, most of all footprint.
So one of the, one of the research papers that I. That basically compared how much carbon footprint, G three generated compared to let's say cars. It showed that the initial training phase that lasted couple of months was comparable to a lifetime of five cars, five passenger cars that generat a certain carbon footprint. So it's massive when you think about it, right? Like it's a lot, they are aware of it and they are trying to address it. And we no longer think that only scaling, blowing things out of proportion and, you know, reaching bigger and bigger levels is necessarily the answer to arriving at more generalized intelligence. I think we are looking at experimenting with more techniques that are trying to achieve the same level of performance, but on a lower scale, which is like some tweak on the architecture. And I think we also are not only. Scaling language models, but also involving other modalities like audio visuals in order to arrive there.Mm. And I think this will be more of a direction where we will go to, in order to achieve this more and more generalized intelligence
Shubham: Following up on what Sandra said. Right. The other way to look at generalized intelligence apart from scale would be how we can make changes in the architecture and use the same number of parameters, or maybe yeah. Along the same lines. So the other way to look at it would be how we can combine different modalities and how multimodality can be brought into. Because G three, if you see, even in all its clarity, it still works on text. It's just a text based model which uses text as input and gives text as output as simple as that.
But if you think about combining different modalities, right? Combining text with image, audio, video, how we as humans perceive things, right? It's not just images, it's not just audio. So it's a combination of what we see and what we hear, and that's how we perceive things. And that's how we make sense of things.
So to get to generalize intelligence, which is similar to human, we need to take into consideration the component of multimodality. And I do think that we are moving in that direction in the, uh, next operations of future Lang futuristic language models. Because if you see recently DTU got introduced, which basically takes text and images and it ES images.
Much better than any artist can do in that given time. So within seconds, it comes up with brilliant images, given the text prompt, and again, your text prompt can be as abstract as possible to still come up with images because it has been trained on billions of images. So futuristic language models can be a combination of dally and G D three, where it combines different modalities like text and images, and then can make sense out of it.
We also look at the other research in the same areas we will get to know that Google's deep mind has released qto. So Gato is a generalized agent, which again, combines multiple modalities and not just text it combines text with audio and video, and it is a multimodal multitask language model. So what it can do is it can use the same weight and it can play Aari.
It can capture images, can chat and even use a robot arm to do a number of tasks. So this is the direction that we are moving towards in the future. And again, a very interesting example that come to my mind, Issan SOS can, what it does is it combines the advancements in language model, uh, with robotics.
So you have this understanding of language model universe, and you combine that knowledge with robotics. And then it's as simple as giving. Command to a robot and the robots become smarter in doing all the tasks that you want them. So multi-modality is definitely the direction that we are moving forward. And this is where the generalized intelligence.
GPT3 Use Cases
Adel Nehme: I definitely agree with that notion, especially on multi-modality and in some sense, reaching a form of generalized intelligence. And I use here, like air quotes for generalized intelligence is both a research problem, but both also a system architecture problem of how can you combine different task oriented AI systems together. And I do think that even if the, to a certain extent on the research side, like the goalpost for what defines generalized intelligence moves, we will, to a certain extent and the future see useful generalized models be actually used in real life. And I think this marks a great segue to discuss some of the greatest re use cases that you've seen GPT3 produce.
There's a lot of actual startups and tools right now that are built on top of GPT3. Can you walk me through of your favorite use cases of GPT3 so far?
Shubham: Definitely. Um, its very exciting to see like how a next wave of startup. Started on top of G three or built on top of GPT3. So while researching for our book on GPT3, we had this section where we discuss about the startup ecosystem, the corporate ecosystem, and the entire effect, uh, on economy, on for language models like GPT3 can have.
So we, we did came across a lot of different use cases. And it would be right to say that GPT3 actually acted as a launch PR for these startups. Some of the use cases that I really like that I have to point out would be viable. What it does is it is a feedback, aggregation tool and intelligent feedback, aggregation tools, which can combine all the sources that you have, your customer feedback, your internal documents, all the insights that you're getting from different sources and it puts it all together and gives you a proper user interface or simple user interface where you can just ask questions and get simple answers on how it works.
So you can send the questions like what's frustrating our customers about the checkout experience. And the application may respond like customers are frustrated with a checkout flow load, as simple as that. So data and analyzing it like what went right, what went wrong and then coming, making decisions or drawing out inferences out of it that entire curve has been reduced to a simple question that you can ask to this AI model, to this GPT3 based application.
So it has really simplified the life of product managers, founders, customer success teams, and all the insight of the teams and the people who are working in these startups. So these companies, so that was one of the very interesting use case that I had that I think had a lot of value in real world and can be definitely used for conventional data analysts instead of doing that.
The other interesting use case that I came across, it was a very recent use case that I came across is super means. So it basically uses TBD three to generate means it was one of the most interesting and funny use case that I came. So, what it does is it takes it, it takes input from you. What do you have in your mind?
It allows you to select the template and it runs CP three and comes up with different kinds of memes in a matter of seconds. So you just select a template, gives what, whatever is in your mind. It comes up with a lot of memes. So now it's. Very easy for anyone to be a meme lot in the world of G three.
Adel Nehme: I'm actually looking at examples right now and they're pretty hilarious.
Sandra: Yeah. Meme meme assistant is a, is a golden, is a golden use case, I think. But yeah, I mean, I mean, generally my, my favorite use cases are also like around the creative use of GT three. I think it's just such an incredible storyteller that's just made for these use cases. It's very natural for it to create a story behind a certain character. You have all these examples, like a dungeon, for example, where you have just like text based adventure game or. G P T three is the powering engine behind all these characters and stories that you engage in and you create as you go.
One use case that we got to actually explore deeper in the book was fable studio, fable studios, like this pioneering VR studio that is creating a new genre of, of stories using new technologies, using virtual reality, but also using AI. And they have experimented. GPT3 to basically create the messaging that the content behind its character, Lucy, they created this me hour twin movie called the wolves in the walls and they have this character that is eight year old girl, Lucy, many appearances, for example, on which, where Lucy was like planned, just engaging with the viewers or was singing a song or just like telling a story and they could engage with it.
And they told us that 80% of all these Lucy appearances were powered by GPT3. So that's just incredible that you can basically create a character with the help of the model. I think there is a big potential there. And so I'm very excited about use cases like that. Another one also related to writing was basically creating copy that allows you to.
I dunno, sell product, create a nice social media post GPT3 is also incredible at that. So there are many use cases such as Copy AI or Jasper or copy Smith platforms where you are able to literally within seconds generate very nice social media, post articles. What have you, whatever you need, YouTube video titles, YouTube videos, scripts.
It's just incredible how much it can be helpful with variety of texts when it comes to the digital realm. So that would be also one of my favorite use cases and also we type into it in the book.
Adel Nehme: That's really great. And what do you think are use cases that will be truly transformative in the short term?
Sandra: Short term, I think is key here. I think we can already see how, how all sorts of assistance related to the creative work are transformative in the sense that they allow you to do more faster, and maybe in a more fun way. G three is incredible for writer's blog when you're writing something and you just feel stuck and you want to generate a handful of paragraphs to choose from, or just, just to keep the creative juices flowing.
So that's a great one and it's Al already available now, and I think it will impact our comfort and our creativity of writing text that's for sure. Another one would be coding assistance. So G D three is not only trained on human language, but also on programming languages. And so it can be used to power, for example, coding assistance, where just like with GitHub co-pilot and.
I think they use CodeDX, which is a descendant of GPT3. It's basically like a younger brother of GPT3. They are using it to help you either learn how to code or to fix certain problems that you arrived at when you're coding. I know already people that are using you'd have to pilot, and I think this use case will be growing in the future and will definitely again, change the comfort and sort of the creative process of coding.
Risks of GPT3
Adel Nehme: Yeah, I definitely agree on that assistant type use case. There's also an adjacent use case that I've been excited about, which is very educational. Like one thing that I've seen is explain your, my code or explain this piece of code type use case, which I think is gonna be really great for democratizing education and programming data science, etcetera.
On the flip side, it's also very important to also acknowledge the limitations of GT three large language models are definitely far from perfect and can make it best, some pretty basic mistakes and at worse, some very harmful ones. Can you walk us through what the limitation and risks for systems like GPT3 are.
Shubham: Before we look at the limitations and risk that G three process. It's very important for us to understand that GPT3 is not a true tell it's a storyteller. And if you think of G three as a minute of human brain, just like all of the humans, it is also poised to make some mistakes and it can be perfect.
Right? We also do a lot of basic mistakes in our day to day lives. Similar with GPT3, if the prompt or the input given to model changes or varies or contains some harmful keywords, I can come up with a response that can be harmful in nature. That can have pretty basic mistakes. If you ask for factual questions that goes beyond the time of the training that goes beyond the time when the model training has stopped, it'll definitely come up with the wrong answers or make mistakes opposed to make mistakes there.
Right. So it is perfect when we talk about the generative or the creative capabilities of G three, but when it comes to a factual answer, we cannot think of it as something that can be perfect and other potential risk that it poses. And the biggest one that I can think of is misinformation, because it is capable of generating a vast amount.
Data vast amount of information just through simple prompting and the fact that anybody can do it. It was a very big risk of a huge level of misinformation that we can see on internet. It can give rise to a lot of propaganda bots, which can just spread information within seconds and that which can spread large amount of information within seconds.
It can dilute the quality of information that is already present because in today's world, it's very difficult to verify the sources. And before you verify the source, this information spreads like fire. So it is very difficult to control the quality of information that's available online. And even with G three, we have seen this example where somebody has created a click date, block, post, and posted it on hacker news.
And then it just went viral. It came on the top of hacker news and yeah, so just like the spread of misinformation bias and all these things, it can do. Yeah,
Sandra: I would add to the misinformation, but we actually did some research on the research out there involving GPT3. And it's vision. There is this really cool report by Georgetown researchers released.
I think within the past two years, they are basically taking GPT here and looking at all these different use cases that are well, my level, you know, let's put it, let's put it like this, all these possible ways in which GPT3 can power misinformation and in, in which they can go wrong. And this report, I was really struck by how easy it is to generate stuff.
Like, for example, tweets that. Propagate a certain idea in the certain light targeting a certain group of people. GT three is actually very good at that. And you can do it easily. You can do it fast. And then as Shaba mentioned, you can have an army of bots on Twitter that just took your agenda forward. So it is an actual risk.
It is scary how good it is at giving sort of the powerful voice to anybody. And in, in this group, we include people that have some sort of agenda political agenda. What have you behind it? Misinformation is a huge one. I would say another one bias. Here's the thing, when your dataset was created, the guys weren't exactly thinking that all the Reddi or the, or the Reddit posts, all the sub Reddits that they were scrapping from the internet to create this dataset, to create this sub sample of humanity or what have you, they will have.
All these ideas that are extremely racist, extremely sexist, extremely X, Y, Z. They will be propagating cert certain stereotypes, political, social, economical, all sorts of stereotypes. And what happened was that because this data was in the dataset, G P D three, for some reason, is able to now amplify them in its responses.
So it needs to be curated. It needs to be filtered in order to prevent these responses. Initially, it was a big challenge for the open AI team and they were working super hard on addressing it. They introduced the content filter for the responses so that they are safe to use. And they also introduced this.
It's called the process for adapting language models to society. That's the name? So essentially what they did was how do we make these. Models that have these crazy amplified biases that we did not expect. And we didn't, and we do not want to have nicer, more adapted to society, better model. Basically, they came up with this process where you create a set of values for the model to follow, and it actually is able to adapt and to follow and become more, more usable in the society, so to speak.
So that's another limitation of the model, which comes from the fact. In this data set, these stereotypes were included.
Adel Nehme: That's really great. I really appreciate the holistic answer. I also share a lot of the concerns that you have, especially on the potential for misinformation and bias, and also creating personal bubbles for people on social media.
If a lot of the content that tailored to your and is tailored to your preferences, we could have the risk of super charging, social media capability of creating political bubbles, as well as social bubbles, but even to the concept of personal bubbles, where all the content is tailored to you and it's autogenerated for you.
And I'd love to discuss that at the end of our chat. A lot of the concerns that you mentioned, and also connecting back to the concern of environmental economic costs as well. What do you think are some of the research solutions or safeguards that are being developed right now to be able to fix these problems in the long term?
And how do you think in the short term teams using GPT3 will have to reconcile or work? They Ray around these limitations.
Shubham: The one good thing about with GPT3 was that open AI was aware from the very beginning that peace can be the potential limitations and can a potential risk that a language model with a scale of three can have.
So they had a dedicated team working on AI, ethics, responsible AI, and defining an AI policy to safeguard the end users from these kind of potential risk and harms. So the thing that they had is along with the language models that they have built, the different variations of GPT3, they also build in a content filtering model.
So whenever you give an input to GPT3, and it comes up with an output, there's a clear lining where if the output is safe, it is highlighted in green. But if the output contains some harmful, as it is contains sensitive content or contains content with this sexist, which is racist, it highlights. And it gives a warning that this content is harmful and not good to use.
So that's the quick fix that open AI came up with. I'm not saying it's perfect, but it's at least something. And we are moving in the direction where we are thinking about responsible AI. We are thinking about AI and how we can tackle those challenges. And in the short term, what teams can do or what end users who are using GPT can do to avoid these risks is.
Go about their prompt smartly. When I say smartly, they can be careful about the prompt that they're giving to GPT3. They can see that whatever keywords that they're giving to GPT3 does not prompt the model to generator some harmful response or some sensitive content. One of the very good example, or we can also call it as a misuse of GPT3, that end users third was of AI dungeon.
So it was a stick storytelling experience was entirely virtual experience where you give certain inputs and the stories and different world get created and was a kind of, uh, very realistic game. But then people started using it for sexist things, racist things, and because the model doesn't have a ball, there is no thin line where model.
Differentiate, what is good and what is bad? It is again, a machine, uh, which will do whatever you ask it to do. So there needs to be some safeguards that need to be put from open AI who has designed the model, but also the end users need to understand their moral obligations and basic duty when they are using these kind of models or techniques.
That's where the AI policy comes in. And that's where the AI ethics as a subject comes. And I do believe these will be burgeoning field going forward. And in long term, we'll see a lot of research on AI ethics and AI policy and how these models can be used. One good example, which I can give you, like in current times you have seen a lot of talks about value targeted data sets, right?
So as Sandra correctly mention no, none of the AI or language model or on NMP model by its origin is biased or, uh, has yeah. Is biased or have misinformation or capability misinformation about the data set it has been trained on. So it's the data set that has been generated by humans. It's the data set we get from internet, those contained bias, and that bias gets propagated to the language model.
So recently we have seen this concept of value targeted datasets, where data sets are adopted. To how the values of society are. So data sets adopted to the values of society. What we think is good, what we think is bad and adapting, those training, those models on the value target dataset, and GPT3 has where you can just take few samples, like hundred, 500 samples, uh, call it a small data set and can tune the model.
That's where you can use a value target dataset. So for your domain specific application, it makes it highly reliable. And it assures that you don't get those kind of sensitive, harmful response. In the end, when you train of, when you find you in the GT three model on a value targeted data set, and to know more about it, you can definitely check out the book.
We discuss it there in detail of how value targeted data set, just look like and how you can find tune GT three for your use case to avoid these limitations and risk.
Sandra: Yeah, I just want to add one, one more point in which it shows that open, like continuously working on its MO on its models and is improving the API a few months back, they have released a series of models called instruct G PT.
And what they do is basically their models that are trained to be much better at following your instructions. They are much better at giving factual answers and they're much better at filtering the unintended abusive, violent, what have your content. So I think they also, not only, they give tools to the community to be able to curb the, these negative potential negative outputs coming from the models, but also they're working on making the API as safer is that when you are using the API, in order to launch some sort of application, you have built a product you want to give it to the world, you are going.
A process where you need to explain what this is for. They are looking at it in depth. They're looking at the type of use case that you're using it for. And then they decide whether it's safe to be released to the world or not. So they give themselves the opportunity to put a stop to something they just wouldn't like this tool to be used.
For an example that Chappa mentioned with a dun was that when both the, a DUNS creators and open a folks realized that the model is being used for creating like sexist racists, content, and so forth, they have used much bigger filters to the content. And there was like a big push coming from the open up.
Will stop because they're monitoring how the API is being used. And they're being able to, I would say they're definitely caring a lot when, when it comes to the safety and they are, of course the models aren't perfect, but they're continuously working on it. And we can expect, as Shubham mentioned, a lot of research coming, making these models better, safer to use.
How will NLP change our lives?
Adel Nehme: That's really great. And we're reaching almost the end of our episode, but we've talked a lot about. The short term use cases, as well as the value of models like GPT3, but I'd love to talk about the future a bit as well. The paradigm shift ushered by large language models and the transformer architecture, I think is truly something.
You know, we saw this with a recently released Gato system by deep mind, the many different, large language models developed by Google, Microsoft, and meta, and the same thing of how the advent of the smartphone ushered the end tools that we didn't thought or apps that we didn't thought were possible before the smartphone.
I think Uber, Airbnb, et cetera. Where do you see the future of NLP systems heading? And what are some of the ways or unexpected ways that you think they will change our lives?
Shubham: So language models like GPT3 has completely changed the way we see and perceive the world. So if I have to put it in simple words, it has just opened the imagination of what is possible. And it has just changed the arms of what is possible. We are living in very exciting times and we have a very exciting future ahead of us because GPT3 has the capability of models like GPT3 has the capability to replace the way people find and search for their information on internet. So it can allow you to access, customize and concrete information.
That is to the point for whatever you are looking for. It's similar to replacing what we actually do with Google today. Right? We search for information, we get a lot of results. Then it's on up to us to make sense, to go through all these different web pages and then find what what's the information that we are looking for.
So let's say finding something or researching about something takes 30 minutes to find the concrete examples and make notes of it. What GPT3 can do is, is can give us anec to the point information in a matter of seconds. So the 30 minutes of yours get converted into seconds and that's all the time you need to get that relevant information.
Another important concept that I want to touch here is I is procedural web. That's something that I think we will be heading in the future. So what procedural web is it? It is a kind internet where content will be adapt to the users. Content will be personalized to the needs and users queries. So, what it can have is let's say instead of me going to Google and searching for different results, and it comes up with the rank number of pages and I go to different pages.
What it will do is I'll search for something. And rather than, uh, searching from a select set of databases, it will generate things on the flight, just like a human does. So if you search for something, it'll generate that thing on the fly and you'll get concrete to the point information. So it's as simple as asking questions and getting the answers.
So that's the kinda future we can experience and we can get with the progress in NLP systems and large language models and moving towards more generalized AI, getting information on the fly. Just removing the time that we spent on research, because research is a part of, it's a very big part of every job, right?
It's not limited to engineering data science or data professionals. Everybody who does any kind of job has to invest a lot of time research. And the only medium of research is internet. And it can just completely change how we look at things conventionally and can create a future where things are more sorted, more well defined and more streamlined.
And we can get information on our fingertips that too very concrete and to the point.
Sandra: I think these are incredible points. And adding to that, I think not only large language models will change our relationship to information, how we consume and how we benefit from it, but also will make it more fun.
Basically, if you engage in searching for information, what I'm thinking of as, for example, being able to talk to virtual assistant that are powered by these large language models that are able to have a really nice small talk with us about or so all sorts of topics. And then moving on to like certain more.
Effective information exchange, but like having this really nice sort of human touch in interactions with the machines would be one of my would be my, one of my bets that I think this will definitely increase our comfort with talking with chatbots with virtual voice assistant will get better and better right now.
It's already fun, but I would say it's still pretty limited. And we can feel that when we're talking to our Alexa and so forth, like my Alexa just woke up, but yeah, it's just, it's going to be, it's going to be better and more fun. That's one thing. And another thing following up on this holding assistant use case, I think LMS will allow us to create also more effective sort of communication with the computer where it'll be much easier to translate from human language into coding in which it'll be also possible to translate from voice commands. VO natural language, voice commands into coding. And I can definitely see a future where I am talking to my machine and my machine is creating a game for me based on what I just described, that I want a certain world with certain characters in it. And with a certain storyline, I think it's definitely going to be possible.
And also I think it's going to make people without coding skills, for example, more and more engaged in this process and just, it'll just democratize access to it. So that myself coming from non-coding background, I will be able, I am already able to create like super basic games with Kodak. For example, I think the opportunity there is incredible. And we are actually exploring this bigger trend of combining the no code approach with large language models in the book we are talking to bubble in the book to the bubble co-founder who tells us how he sees this, this aspect moving forward. Yeah, I'm really excited about this one as well.
Worrying cases of GPT3
Adel Nehme: Yeah, these are super exciting use cases. And adding on top of that, I think there's also a potentiality for even people with low digital literacy or not really high ability to use computers, to leverage this voice command, to be able to use machines generally, you know, there's one example of a step.ai, which is, I think had a former open AI engineers working on it as well.
And what they do is that they let you use your computer through voice command, say, download Excel, do this, do that. And I think the potentiality of having a Jarvis like assistant is gonna be super exciting. So as these models get better, what are you most excited about and what are you most worried about?
Sandra: So I think the excitement is obvious. Like you, you have this super powerful tech, you see all these different ways in which it can be like awesome. And what I'm just looking for is to seeing it more and more in the real life right now, I'm talking about the potential, the different like early science of where it can be really good at.
But what I would really want to see is to be surrounded by all these applications, to be talking to my Alexa and having my blast, having an awesome conversation, or to be able to create this game. So I'm really looking forward into actually moving from discovering the possibilities into actually bigger, more wider adoption coming from not only startups, but also enterprises that we use these products on a daily basis.
I'm just looking forward to that for sure.
Shubham: What really excites me is how this language model like GPT3 has the capability to bring in people from different backgrounds into the AI ecosystem, which wasn't a possibility which wasn't even near possibility before, because AI used to be such a big buzz words.
People hear it. And they were like, oh, That's not for me. I don't have the technical query. Say I am not a technical person. I am not a research scientist, but with GPT3, it is getting much and much easier and more and more organic and natural for people to understand what it can do and come up with different applications or use cases in this ours.
And like we already saw in case of replica, Replika, and fable studio of how they actually came from a design background, film background, and they combined GPT3. So what Replika did, is it combined G three with a virtual assistant, uh, personal, personalized virtual assistant, giving you a personalized experience of chat experience. And what payable studio did, is it combined GPT3 with metaverse like creating wound storytelling and all these things and how it worked flawlessly that, uh, person with an engineering background or a data science background would've found it hard to do. So it is very exciting to see how people who are not even related to technology can come and put their ideas into execution and the products that we get to see.
Right. So I, I do feel, and I'm very positive about it, that in next one or two year, we'll see a wave of these startups, a wave of these products, which will just blow our mind and coming to what worries me a lot about GPT3 is again, going back to the bias and data sets that we have, that the model has been dreamed on.
Because it is nearly impossible to eradicate those biases completely from the data sets, because those have been existing for decades for years, and they have been there and we don't have an option to create data sets from scratch. So that has been there, but what we can do and what I am really looking forward to is coming up with research solutions like content filtering model or something, which again, streamlines whenever there is a bias or highlights, whenever there is a bias or there's a sensitive content or comments on the output.
So I, I am positive that we, we are gonna resolve and tackle all these challenges with the research that's going on in the field of ethics and AI policy. So yeah, I hope to see this problem getting resolved soon, very soon. Yeah,
Sandra: I, I realized, I forgot to mention what worries me as well. So I, I would definitely second the Shaba in terms of the bias, but also what we have mentioned before about the, the potential for using this tool for political propaganda.
I, myself am based in PO right now. So it's really, it's really close to Ukrainian conflict to Russian aggression. And I am seeing on a daily basis, the campaign. Which are designed at finding all these bots that are spreading misinformation within the context of the war and taking down these accounts and just imagining how GPT three could power.
These accounts just scares me, honestly, that that would be my biggest concern at the moment.
Adel Nehme: Definitely that's a really great list. And to your points, Shubham, especially on the, combining the metaphors with a lot of AI generated worlds to a certain extent if VR matures and multimodal model matures as well, we can have the potentiality of a lot of AI generated worlds where people have their own unique, personalized experiences into your concerns here as well.
The risk of misinformation and bias are really huge, especially once you combine that vision of AI generated worlds and what that could, what could that mean? My biggest concern when it comes to large language models, as well as like image creation models, like DHI is the potential for, I mentioned this slightly in our conversation for personal bubble filters where people really lose reality of a collective is reality to a certain extent, and don't have a shared experience anymore simply because their feed is curated for them by autogenerated content that is created personalized for them. Do you see that as a risk in the future, by any chance?
Sandra: I think personally that it's already happening. A friend of mine has a couple of Twitter. In order to be able to tap into different communities and based on this little experiment that any of us can do, you can see how different your feed experience will be.
Like we are basically existing in these eco chambers already. I think we are already there. It's still not in the context of the metaverse, but it's metaverse will be basically translation of the reality into the meta. So going to meta, I think we will face the same problems as we are facing now, unfortunately, but yeah, we just need to be better at designing the algorithms that are.
Taking us maybe purposefully out of these bubbles, being able to create personalized experiences, but at the same time, having a dose of just throwing us out of the comfort zone and experiment with that, if it works and keep on improving. Cause I don't think we have much choice other, otherwise we'll just end up in our own eco chambers in these close communities that only are able to talk to each other and don't have the language to talk to people outside of the certain group where they are
Adel Nehme : finally Sandra and Shubham, I really enjoyed our conversation as we're closing out, how can people access GPT3 ? And how can people read your book?
Sandra: So, of course our book will be out in mid-July, it'll be available in the ebook version and it'll be available in the physical paper version. At the end of July, you can already actually order it via Amazon.
We have released couple of digital resources along with the book. One of them is a sandbox powered device trim, where we help users to create their own applic. And we have resources on that on Kitab and also on YouTube, we have also started releasing the conversations that we have mentioned here in the podcast that we had with the many different sites of the ecosystem stemming from startups co-founders of startups part by G three people that built the API influencers in the space.
We, we try to talk to as many people as possible that are involved in creating this emerging ecosystem. So we are releasing these conversations as well via YouTube. You can check them out already
Shubham: just to add. We also have this TPD three cloud where you can just go to GPT3, do. Premier into get an upgraded premier into what GPT3 is, how you can quickly get started.
What other steps to take? And just simple, two to three steps is all you need to get started with G three and it also highlights. What, what do we cover in the book? So a basic overview of what GPT3 is, how you can get started, how the ecosystem looks like and where we are moving forward in the future.
So that's what GPT3 club cover as well.
Adel Nehme: All right. That is awesome. And Sandra, thank you so much for coming on DataFramed.
Sandra & Shubham :Thank you for having us.
Adel: You've been listening to data. Framed a podcast by . Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment and share episodes you love that helps us keep delivering insights into all things. Data. Thanks for listening until next time.
How to Become a Data Scientist in 8 StepsFind out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
How Data Science is Changing SoccerWith the Fifa 2022 World Cup upon us, learn about the most widely used data science use-cases in soccer.
How Chelsea FC Uses Analytics to Drive Matchday SuccessGet behind the scenes at Chelsea FC with Federico Bettuzzi to see how data analytics informs tactical decision making.
Top Machine Learning Use-Cases and AlgorithmsMachine learning is arguably responsible for data science and artificial intelligence’s most prominent and visible use cases. In this article, learn about machine learning, some of its prominent use cases and algorithms, and how you can get started.
Vidhi Chugh •
Inside the Generative AI Revolution
Martin Musiol talks about the state of generative AI today, privacy and intellectual property concerns, the strongest use cases for generative AI, and what the future holds.