Industry Roundup #4: O3 & O4-mini, LLama 4’s Rocky Release & Google’s Agent Ecosystem

Adel and Richie cover the launch of OpenAI’s O3 and O4-mini models, Meta’s rocky release of Llama 4, Google’s new agent tooling ecosystem, and much more.

May 7, 2025

Guest

Adel Nehme

Host

Richie Cotton

Key Quotes

If you reach a point where you have systems that are both great at math, great at coding. And they are operating at near level human genius level, And then you're able to build gentech use cases where you can have like recursive self-improvement. I do see a future where you can reach a point where you have really strong intelligence that may even go beyond human intelligence. I do agree with the limiting factors of being energy and money, but I don't think if the opportunity is there.

There is a strong incentive to rank well on benchmarks because that's one of the measures that people use in order to choose which model should they be using. Just because the benchmark score was high doesn't mean it's gonna be good for what you wanna do. Benchmarks are kind of like a shopping tool.

Key Takeaways

The 2025 Stanford AI Index shows geopolitical divides in AI development and attitudes, with the U.S. dominating model output, China gaining ground in performance, and Europe lagging in both capacity and optimism.

Responsible AI is being deprioritized in the rush to deploy frontier models, raising concerns that safety and alignment may fall behind as competitive dynamics intensify.

Google’s new agent ecosystem, including the Agent Development Kit and Agent-to-Agent protocol, reflects a shift toward multi-agent systems, emphasizing standardization, ease of use, and collaboration among intelligent agents.

Links From The Show

Introducing OpenAI o3 and o4-mini

The Median: Scaling Models or Scaling People? Llama 4, A2A, and the State of AI in 2025

LLama 4

Google: Announcing the Agent2Agent Protocol (A2A)

Stanford University's Human Centered AI Institute Releases 2025 AI Index Report

AI 2027

Transcript

Adel Nehme: All right. All right. All right. We are live, Richie, how are you?

Richie Cotton: Is good. It is a sunny day here in New York. Uh, I'm dad. Winter is finally over and yeah. Feeling summary. How about yourself, Adele?

Adel Nehme: Yeah, feeling summer as well. It's pretty sunny here as well, here in Belgium and it's Easter weekend, so I'm excited for the three day weekend here. Do you have any Easter plans, Mr. Richie? I.

Richie Cotton: Yes, certainly only a two day weekend here. But uh, yeah, the plan is mostly eating chocolate and probably going for a nap afterwards.

Adel Nehme: That sounds exciting. A lot of, a lot of chocolate. I am personally going to a rave, not a traditional. Easter plan, but we're still gonna do something. So excited for that. It's gonna be a day rave out in the sun, so I'm pretty excited. Hopefully I'll be well recovered by Tuesday. But we shall see.

So, Richie, what are we gonna be talking about today? I.

Richie Cotton: All right. So, to begin with, we're gonna talk about some of the new releases from Open ai, from Meta and from Google. So lots of exciting tech just coming out or. Announced. And then after that we're gonna go global. We're gonna talk about the AI index report that just came out of one of the Stanford Institutes.

And then we are gonna talk about whether or not AI's gonna kill us all. Just to top it off a, a cheery story to to wrap up.

Adel Nehme: Yeah, very,... See more

very positive story. We're gonna talk about the AI 2020. Seven research report that was released a couple weeks ago, which is a really interesting read. But, we'll, we'll stick around till the end to see how things could unfold. But yeah, what's the first story of the day?

Richie Cotton: So, OpenAI have released the oh three model and the oh four model. Can you just talk us through what's new in these models?

Adel Nehme: So if people remember, O three was actually released in December during the 12 Days of Christmas event, and it was still under preview. It was not released. And I think then either at the time or a bit after they released O three mini. And it seemed to be doing relatively well, but when they showed it during the 12 days of Christmas, it seemed to have really good capabilities on reasoning math and even did pretty well on the arc a GI benchmark.

And a lot of people started saying, oh, this could be a GI. So now people have access to it. You can access it in chat, and I think it's gonna be accessible soon via the API. And in a nutshell, O three has much better reasoning capabilities. And the two defining features that seem to give it a lot of this improved reasoning capability is a couple of things.

One native tool use, three, is able to. Do Google searches, do image generation to be able to interpret images so it's actually able to reason with images within the chain of thought and to use code interpreters so they both to execute code to be while reasoning. And example, that was shared here, was able to do reasoning about an astrophysics research poster.

So it actually took the picture of a research poster within the chain of thought. You could see the model zoom in on certain parts of the poster to be able to come up with a certain answer when prompted. Or for example, the ability to take a blurry image or an image that's upside down and manipulate it using Python to be able to better understand and reason with the image.

This is quite exciting because. It does open up a sl of different use cases. Think for example, being able to do a rough wire frame to code generation, to ability to iterate on it. That's quite interesting and I think there's gonna be. Better use cases for vibe coding with these models down the line.

And this does definitely translate to better benchmarks, right? Like open AI oh three and oh four mini, we're gonna get to oh four Mini. They do crush a lot of benchmarks on coding, math and sciences, visual reasoning especially, and that's pretty interesting to see. But we're definitely gonna have to wait and see how these models behave in the world.

I've been using them a bit over the past couple of days. They're definitely better than g PT four. Oh. Definitely better than GT 4.5 and then oh one, but still encountering errors here and there. Still have to fix stuff. It's not the a GI panacea that we're all hoping or maybe if you're in Belgium, according to the Stanford AI Index report, not hoping for so we'll see.

that said, oh three is now that frontier model from Open I, Greg Brockman was on the live stream saying, Every now and then we release a model that is a foundational leap. GT four was that foundational leap. O three is that foundational leap as well. So we'll see. What's really interesting as well, and you can see here some of the reasons why OpenAI took such a long time in releasing that model is if you remember during the showcase of oh three during the 12 days of Christmas event.

It was quite expensive to run. I remember kind of oh three high, which is the high end version of that model cost around $3,000 to run inference on. So unless you are a multi-billionaire, you're not gonna be able to use that model every day. But now seems like on the API, at least what they shared it's much more relatively affordable.

So O three apparently has. $10 per 1 million input tokens, $2.50 per 1 million cashed output tokens, and $40 per 1 million output token, it's definitely not cheap, but it is definitely not a $3,000 inference price tag. So pricing here is definitely complicated with reasoning. The cost depends on how long it thinks as well.

So, tokens here also include reasoning tokens. So that is something to think about so you're not just getting the output. And this comes here at Why OpenAI also released O four Mini, which is apparently looks like their next generation model, but a much more smaller as well as cost effective version of it.

And that is much more cost effective and also relatively cheap with good performance. Which makes me wonder how good fully scaled O four will be. And I think we'll find out in the next couple of months or so. So that's essentially it in a nutshell. Richie, I'd love your take on a couple of things.

So, according to Noam Brown, he's a researcher at OpenAI. He shared the graph. If you're listening in, we're gonna link it. And the description below, scaling, inference, and scaling. Reinforcement learning seems to be holding up. So if the scaling laws continue, simple question.

Easy one here for you, Richie. Are we gonna achieve a GI this way?

Richie Cotton: Let's go with the simple questions to open up. Are we gonna achieve a GI? Alright, so there are many different measures of like what constitutes a GI. Some of them we hit already, some of them. We're not gonna hit. I think the more important question is, is performance gonna Continue to improve? And are those improvements actually gonna benefit users like either individuals or companies that want to make use to these things?

So at this point, I'm pretty much convinced that. Given the level of investment and the number of like very smart people working on ai, we have the capabilities to create a better than human artificial intelligence. So the idea of an an artificial super intelligence. So for me, the two limiting things are first of all the economics.

Is the money gonna run out? Like all this sort of venture capital money is that gonna run out before we can get to that point? So we've a, with. Self-driving cars. So the self-driving cars, we kind of know how to make this sort of level five completely autonomous car. It's just that it's probably gonna be so expensive at the moment that the economics aren't quite there in a lot of the self-driving car startups sort of disappeared.

So that's one sort of blocker. The other big blocker is gonna be around usage, so. DER to AI is expensive to run already. It's very energy intensive. These reasoning models are just auto magnitude more expensive. once you bring in like agent use cases, then The, Power grid infrastructure is not there in most countries in order to run these sort of things. So it's basically money and energy are gonna be the limiting factors, not our ability to create artificial super intelligence.

Adel Nehme: Generally I agree with the intuition. I think what's the, kind of the canary in the coal mine here is whether we're able to arrive at systems that are able to integrate. I'll maybe take a step back. If you reach a point where you have systems that are both great at math, great at coding. And they are operating at near level human genius level, And then you're able to build gentech use cases where you can have like recursive self-improvement. I do see a future where you can reach a point where you have really strong intelligence that may even go beyond human intelligence. I do agree with the limiting factors of being energy and money, but I don't think if the opportunity is there.

I cannot see governments and corporations not putting in all their capital in being able to reach that point. And we'll see here at the AI 2027 report, it essentially touches upon a lot of these dynamics that we're discussing here. But it is quite interesting. I think the main question for me is, will we keep seeing these scaling laws, right?

Because a lot of the assumptions that we're discussing about of money, energy, investment, so on and so forth, are predicated on the assumption that scaling laws will continue. And I guess we're gonna have to see here whether we're gonna need any algorithmic innovation on top of these scaling laws.

Richie Cotton: Actually, I, I have a question for you about the open AI releases. And it's kind of related to this in that there are so many different models available from Open AI now, do you think oh three is the one that's gonna take over for everything? Has it made some of the, the older models like GBT four obsolete.

Adel Nehme: No, I actually still use GT four oh quite a lot for, you know, maybe the number one use case for generative AI is writing emails. No, but. daily routine tasks that don't require reasoning. I still use GT four OS on my default model. But I do think that OpenAI has reached a Google Gemini level of confusion in their user interface, if not more.

Like I'm looking here at the chat GT interface. We have O 3 4 0 4 oh scheduled tasks. Who uses scheduled tasks? 4.50304 mini O four mini high GPT-4 oh mini. And GT four is leaving on April 30. Rest in peace. GT four. So we have, what like 6, 7, 8 models that you can choose from. And I think this is gonna be probably Sam Altman has hinted at this.

I think what will happen is that you'll just have one model, but you'll have an orchestration engine depending on the task. the best model is assigned and you'll probably be able to access the developer mode. Where you can choose your favorite model based on the task. I think that's probably gonna happen.

But yes, indeed the naming convention is a mess. And I don't think that O three will be the go-to model for everything. I think it will be the go-to model for developers. I think it will be the go-to model for researchers. But if you are, a casual user of ai, I don't think. Unless.

Richie Cotton: Okay. That makes a lot of sense to use the, the simplest model that you can get away with. 'cause it's gonna be cheaper and easier. Also, yeah, I have not used visual tasks. If you wanna schedule stuff, airflows the way forward and there is a new airflow release, airflow threes coming out I believe very soon.

So, I don't think we're gonna talk about that, but yeah exciting news on the data front at least.

Adel Nehme: Yeah, I mean here you're talking about data scheduling, but really GT four O is just like, with scheduled tense. It's just like glorified alarm. It's a glorified reminder tool. That is, that is it. Just use your reminder, use your phone. We talked about oh three oh and oh four mini. Let's talk about additional releases.

Richie. So seems like LAMA four isn't as good as people hoped and talks about meta watching the release here. Richie, you wanna talk to us about what happened? I.

Richie Cotton: I feel slightly bad for meta because it's like, you spend out many billions of dollars creating this fancy new model. The benchmarks aren't quite there and suddenly half of social media is just complaining about it. So it's a little bit unfortunate on, on that part. I. there have been some accusations that because the performance wasn't quite there, meta, were trying to gain the benchmark.

So, because you train these large language models against a lot of internet data, you can train them against like, what's publicly known about the, the benchmarks. And that gives you better performance on the benchmark. It's not actually benefiting real users. little scandal there.

But the biggest story is that Meta has a fundamentally different attitude or business model for developing these large, large group models compared to say, open AI anthropic, where it is existential for those companies to have the best model for meta is just kind of a side gig on top to help out the social network and then they give it away.

So they're monetizing it less. But, they're still providing stuff that's available for Preach of the World. So cutest to them is very cool stuff. Now, there is one other complaint, which is slightly more legitimate in that the LAMA four models come in different sizes. The smallest one is called the Scout model.

Now. This is actually still a pretty big model. This is a a 17 billion parameter model and then some grumbler from researchers. So max Ramon, who is a future data frame guest he was saying, actually as an individual, really hard to work with this stuff because you need multiple GPUs just to fit it in memory.

So the sort of sweet spot at the moment for, individual researchers. Also smaller scale laboratories stuff you could put us single GBU, like 4 billion parameters is kind of ideal. Maybe 8 billion parameters at a push. So these smaller models are much more useful for researchers 'cause you can play around the stuff.

It's a bit faster and cheaper to do.

Adel Nehme: and speaking of that problem here as well, to give some context, like I try playing around with fine tuning LAMA four for example. You do need like quite a bit of budget to run GPUs here. You can't just do like a Google call app with free TPU. Access, for example, to be able to fine tune it contrary to LAMA three or LAMA two, for example.

So it does definitely limit your ability to create useful new variants based off of this model as an individual, which is truly what the spirit of open source is about, And that's where it comes here. There are also grumblings on context length. Do you wanna talk to us a bit about that?

Richie Cotton: Oh yeah. So the other thing around this is one before, claims to have a 10 million token contact length so it can remember basically everything that you throw into it. If you want to get it to work on dozens of documents at once. Actually, I have no idea. Have any documents? 10 million. It's a lot of documents you wanna throw like humor.

Adel Nehme: it's most likely like 14 books, if not more, because I think 2 million is like the five Harry Potter books.

Richie Cotton: I love that you have the statistic on like

Adel Nehme: 'cause I did a, I did a code along on the Gemini model, which has the 2 million context length window. I tried to run the math. Yeah.

Richie Cotton: Okay, so it's big, but sore. Boko also a data brave guest was complaining that actually models aren't trained on any kind of import data that's long. So actually. if you try and use the 10 million token context window, you're probably gonna get garbage results just 'cause the models just aren't optimized for that.

There aren't any good pieces of input data that are that long. So, it's probably gonna give you jump results and it's more of a, a banty metric than anything else.

Adel Nehme: So maybe a couple discussion points, questions that I have for you here, Richie, if it's not a reasoning model, are we nearing the end of continual gains in lms?

Richie Cotton: Well, I mean, there, there are enough companies trying to make better models. The competition is incredibly fierce. Like it's gotta be brutal being a frontier model, creator. So yeah, we're gonna see gains, but. There are very much limits in terms of can you generate better marketing copy?

Like this stuff kind of, it just worked already. Like this is like my main use. It's like I hate writing about writing, marketing copy. I've allowed source that to ai. So, yeah, you can't get that much better at that. One thing I have noticed is some fun like story generation apps you can do, like choose your own adventure with AI and those kind of work better and the.

Benefit there is not like that the LLMs are getting better. It's more that the remember stuff that you said. So they don't just forget characters that were in the story like two minutes ago. So von context window and better, like memory handling is, is a really great feature for, for that fun use case.

And there are benefits, like it's not reasoning, it's just other parts of, pretend brain. Like memory is very different from thinking.

Adel Nehme: I think something important here, right? Because. Perform well on benchmarks is something, and the ability for a model to be useful in the real world is something else, I don't think benchmarks, for example, measure memory, They measure the ability to solve problems, There may be specific benchmarks that are able to measure memory, but generally, reasoning benchmarks or math benchmarks or coding benchmarks or something along those lines don't necessarily do that.

But indeed, as you mentioned, like one of the things, that makes the GT four oh image generation so useful is the ability to retain context and build the ability to edit an image and iterate on one particular image. the more l large language models have the ability to kind of keep that context be cohesive.

Create, and provide value for the user and however they query it. That's, I think actually goes a much longer way than the ability to just do really well on benchmarks. But that said, the vibes on LAMA four and what I discussing have not been so positive as well. So it seems like it's a losing battle on both sides here.

Maybe last one, given the, the discussions during the accusations. And we've also seen this slightly alluded to during the 12 days of Christmas. So this is not just a meta problem. It was alluded to during the 12 days of Christmas release of oh three when oh three did really well on the A A GI benchmark, but our LLM providers trying to gain benchmarks and is this the game that we're seeing today?

Richie Cotton: Almost certainly there is a strong incentive. To rank well on benchmarks 'cause that's one of the measures that people use in order to choose which model should they be using. I mean, obviously if you're a product manager or an engineer, you wanna try lots of different models and see how well does this perform on your own use cases.

Because just 'cause the benchmark score was high doesn't mean it's gonna be good for what you wanna do. Benchmarks are like, they're kind of like a shopping tool. How well does this model perform? It's like, oh, maybe I'll take a look at that. So, yeah there's obviously a, a strong incentive for all these model creation companies to do well on benchmarks.

So we're likely to see scandals like this thing with, with Lama for over and over again.

Adel Nehme: Yeah, I'm not, I'm not surprised here either. So third story of the day, I think this is the last one, covering technical releases. So Richie, do you wanna walk us through it? Google and its new AI agent tooling ecosystem.

Richie Cotton: Absolutely. So, Google had their cloud next conference and they made some announcements around agent tooling at the same time. So agents obviously like that. The hottest thing this year in ai. And there were two releases. So the first thing is the agent development kit. So this's, just a open source framework for creating agents.

Now there are a few of these around already, so, I. Long change publish was the first AI development framework and this just added some agent features suddenly LAMA index which was also spun outta meta that's been around for ages. And yeah, I. As agent features. There are a few others who's got tool use.

Microsoft's got auto gen. You got other tools like crew, ai. There's a ton of these things, so it's a very crowded space. But if you are working in a Google site, like I guess if you are primarily working with Gemini, you probably want to use agent development kit. And yeah, let's see how it pans out.

Obviously it's just been released, it's early days, but it's worth taking a look at if you are building agents. Now the other thing which may be more exciting is a framework called agent to agent. So actually it's a protocol rather than framework. So it is about standardizing communication between two agents as the name's Jess.

I think the closest thing that exists already is model context protocol from philanthropic. So, that is about standardizing how agents communicate with tools. And Agents, agents about how two different agents talk to each other. I'm not quite sure what the use case is yet. But I presume in general when you have agents, you're probably gonna have like one agent that does stuff, and then you have another agent that monitors the quality of the original agents work because.

Obviously they get things wrong, and so you wanna make sure that you have some kind of quality control in place. So it's probably gonna be standardizing communication in that kind of multi-agent setup. And obviously you have more complicated agents, so, let's talk having, like, if you want a, a scientist, then there's lots of different things you want, someone to do, like, background research.

You've got an agent for that and you've got an agent to like. I've now come up with new hypotheses and all these sort of things, so different stages of your workflow, they're gonna require different agents. You can probably get very complicated setups, and so this is gonna be an important tool for.

Adel Nehme: I'm really happy that Google is, taking the lead on building these types of ecosystems and frameworks and protocols because we will reach a point where you have. Quite a few different AI agents frameworks and kind of tools and working in the wild, right? And to be able to run these communication protocols that are standardized makes development much easier.

So maybe given this and the tooling framework, especially on the agents side is maturing, we saw even on kind of protocols, like the model context protocol is also maturing. What do you think this means for developers and organizations, Richie?

Richie Cotton: Agents are not your primary thing. So you're not like a tech company. Then you don't really want to care too much about the tech stack. You just want it to be easy. 'cause you want to care mostly about your business processes. It's like, how do I automate these things? How do I do stuff? How do I achieve things?

How? With these agents rather than like worrying too much about building them. So having agents that are really, really easy to create is gonna be huge. And that means better tooling, better middleware. So we really need to reach a stage where creating an agent is as easy as creating a dashboard.

Adel Nehme: That will be an exciting time. That will be, I mean, there are already, local tools to be able to build these types of, mini AI agents to a certain extent that perform tasks. But indeed, if it becomes really streamlined to build agents that can communicate with other agents relatively easily without having to think about it, that will be super exciting.

And maybe also a really easy one here. Where do you see the AI agent space going in the next 12 months?

Richie Cotton: You can't gimme these big questions. I was like. I'll just get my crystal ball. So there's only a few things you care about. Like basically at the moment you can do really simple, narrow agents. it is actually as easy as creating a dashboard. At the moment. You wanna do something really hard, like, okay, I'm creating an AI doctor, or I'm creating an AI software broker that's like crazy research and it doesn't work very well at the moment or it's not that reliable.

So, basically. The way we're going is it has to be more complex in its ability to reason it can do broader tasks and going faster as well, because sometimes you don't want to, something to be thinking for a minute before it performs the task and then also more reliable as well because sometimes you want it to be correct, pretty much a hundred percent of the time, not just 90% of the time.

Adel Nehme: Okay, so better reliability, better, ease of creation and just more raw intelligence.

Richie Cotton: Yeah, yeah. Smarter, faster, more reliable.

Adel Nehme: So, I think this wraps up this story. Next up we're gonna talk about the 2025 AI index report. So maybe give us a bit of background, Richie, on the report, and I'll go a bit more deeper into the results.

Richie Cotton: Yeah, sure. So this comes out of the Human-Centered AI Institute at Stanford University. So this is an institute founded by of AI data collection legendary fame. And yeah it's just a great piece of thought leadership about what is going on with AI all around the world.

So yeah. you've gone into it in depth. Tell us more.

Adel Nehme: So personally it is one of my favorite reads of the year. And the state of AI report by First Point Capital, I think are my definitely two reads of the year. So a lot of the results are not surprising, right? Models are getting better, they're becoming more intelligent, right? And you see this in the benchmarks and you see that Large language models. I think we experience this every day. AI models are becoming more and more useful as well, so they're able to take on even more tasks than the previous year. Adoption is increasing. There is a lot of excitement on adoption, especially by businesses and a lot of money is going in.

They're all investing in ai. One of the things that I am particularly thinking about these days, especially when you consider the state of the world and the state of politics today, is the geopolitical angle of ai. So currently the USA is leading on notable model releases according to the report.

So, 40 models in the US according to the report, and 15 to China and three to Europe. So Europe is pretty behind here as a resident of Europe, that makes me sad. But China is closing the gap on performance, As judged by the benchmark. So we can see this definitely with deep CQ R one and I wouldn't be surprised if we see deep CQ R two in the next couple of months.

And there's definitely more publications and patents, There were also notable launches in terms of models from regions such as the Middle East, Latin America, Southeast Asia, so on and so forth. And if you look at the geopolitical dimension here, maybe taking a bit of an aside regardless of where you think of the latest craziness when it comes to the tariff policies and what's going on here.

But if you look at a lot Behind the, tariff policy and you listen to a lot of folks that are close to the presidency in. Are talking about the tariff policy, a lot of it seems to be related to shore up the semiconductor industry and to better compete and solidify the US' advantage in ai, So, this is definitely something to look out for in the next couple of years, and we're gonna touch upon AI 2027 that heavily emphasizes here the geopolitical competition between the US and China and ai. But definitely something very interesting. Now, one thing that also in the report that surprised me is.

The degree of excitement and skepticism across different nations about ai. So countries like China, Indonesia, and Thailand, the population here, I'm not talking governments or officials, are pretty AI enthusiastic, They seem to generally have a positive attitude towards AI in the sense that they think it is beneficial and I.

On the other hand, Western countries actually have less of a favorable view towards ai. for example, Belgium, the Netherlands and the US are some of the most AI skeptical nations. Surprising on the us. Not surprised. In Europe France and Germany had the biggest increases in AI optimism, which is pretty interesting.

And India had the biggest drop compared to last year. That's also quite interesting here as well. One thing that the report also covers here data from the Raspberry Pi computing education center which covers and tracks countries that, Provide AI education across the schooling system.

So there are about 20 countries here Britain, Australia, most of Eastern Europe, Saudi Arabia, South Korea, Argentina, a few countries I hadn't expected here, including Kazakhstan and Ghana. So, good on these countries that provide AI education. And it will definitely, I think it may not be as important today, right?

But we'll definitely see a, generational gaps being created over the years to come between these different countries if they do not invest in AI education. Maybe here a shameless plug to check out data camp for classrooms. It is free for teachers. But the response to AI situation is a bit sketchy.

So that's another one that also caught my, eye. There are many new, really good responsible AI benchmarks, but not all l large language model providers are actually testing their models on these benchmarks before release. And this comes back as well. We got touch upon and the AI 27, I'm very excited for that discussion.

But the increasing arms race around ai, whether it's. The US and China or between open AI and Anthropic, And all of these different frontier model providers, are creating dynamics where safety is not as important as it used to be, and now it's more about deployment and making sure that these frontier models drive value for users, right?

Now, personally speaking I do not think like doism necessarily is a good path forward when it comes to kind of, the responsible AI discussion. A lot of the discussion here seems to be couched by discussions over, that even if you release a large language model today, it's gonna destroy everyone, But. Definitely responsible AI is extremely important and the report does show that there are notable increases in AI related incidents, right? So, as AI becomes even more embedded in the workflows and businesses and daily life and as AI related incidents increase, there may or may not be one big incident that will change people's minds, I think in the next two years.

So, that's essentially in a nutshell the overview of the report. I don't know, Richie, did you read it? What was also came out to.

Richie Cotton: Yeah. So those cultural differences are, are very fascinating. I say, the fact that Asians are more enthusiastic about ai. That doesn't surprise me. But the particular countries, like I hadn't realized like, Thai people and Indonesians would be like the most enthusiastic about ai. I would probably have had Japan actually 'cause Japan has like a big culture or his historically at least, of like being enthusiastic about robots.

So. That would've been my guess. But yeah, there's certainly a lot of skepticism around ai in many Western European countries. I think.

Adel Nehme: Yeah, and I think it was quite surprising for me to see such skepticism about AI in the West, especially. But that I think shows kind of the myop. P that I have about ai. 'cause you know, I'm relatively in the know when it comes to the industry. I use AI every day. I talk about AI for a living.

So it's relatively exciting for me, But the data here, this show that if AI veers to becoming a, you know, I'm not saying intentionally, right? But if we have some form of externality or. Unintended consequences, which we will certainly have when it comes to ai, And AI is increasingly seen as a force for harm than a force for good.

I do think that we're gonna, we may have a social reckoning when it comes to ai, especially in the west. A really good example comes from social media, So if you remember in the early two thousands, 2000 tens, social media was often looked at as a generally net positive thing. And then 20 15, 20 16, Cambridge and Alltech scandal.

Social media is rotting people's brains. Jonathan Heights spoke recently on the anxious generation, right? And now today you see, schools banning phones in the school and in the US you see Australia pushing for age verification on social media. I'm not saying this is a bad thing, right? I am just saying that you see a reversion to the mean, right.

And I think the backlash on ai, given that we are experiencing. The growth of AI and potential backlash of AI in a much more compressed timescale could even be sharper, if not well managed. And depending on how the backlash goes, And it goes forward to the discussion we're gonna have on AI 2027, The prisoner's dilemma of. Do we accelerate or do we slow down? becomes very real when your competitors also have to make the decision here, So if the United States or if Europe or the West slows down, but other countries do not. Then you'll be in quite a disadvantageous position compared to your competitors, So a lot to unpack, a lot to think about. Not necessarily a lot of clear answers here.

Richie Cotton: Absolutely. So yeah. AI is gonna have some big geopolitical implications, and I think that leads nicely into the next story about the AI 2027 project.

Adel Nehme: So I've been harassing the team all week sending this, this research project to a lot of folks, and I'll caveat first right before anyone jumps in on the comments or, you know, reaches out. that I'm being too pessimistic, I am generally not an AI dor.

To explain like where I come from, I'm actually quite optimistic about ai. But I am cognizant of what exponential growth looks like and to give like a very, real use case of what exponential growth looks like. I would like to remind you of what happened in 2020 when a virus that the coronavirus had infected, thousands of people and one month suddenly became a global pandemic within a few weeks, right?

And that's what exponential growth looks like, So maybe I'll give some background. AI 2027. It Highly in depth research paper written by Hope I'm pronouncing the name correctly here. Daniel Coco Talo, as well as Scott Alexander, Thomas Larson Eli Leland Romeo Dean, which are some of the are previous open AI researchers. Some of them are responsible AI researchers, some of them are policy makers so on and so forth.

Definitely one of the more interesting reads I've had in a while in this space. And essentially the premise of this. Research project, research paper, is that, all of these Frontier lab CEOs whether it's Sean Altman Dario Monday talk about how a GI will come in the next five years, And talk about how scaling laws will continue. And we touched upon this earlier in our discussion on a three, So let's make this basic assumption. Let's assume that. A GI that scaling laws will continue, And that reasoning models will continue to improve at the pace that they are improving on today.

In that scenario, we could see a. And systems that improve themselves. So systems like the technical term here would be recursive self-improvement in the next two years by 2027, Which is what AI 2027 here is about, And if that happens, we would be entering real uncharted territory, So imagine the systems that are near human genius level, In math and coding, being able to improve and build on their existing architecture and build new, more efficient, more intelligent architectures. That could lead to what a lot of people call an intelligence explosion. where the rate of progress of AI systems become so explosive where we go from a GI to artificial super intelligence and the researchers that are behind the scenes, all what they're trying to do here is struggling to keep up with all of the research that these models are putting out.

So the paper describes a scenario based forecast, where it starts in 2025, I think, or late 2025. Describes a hypothetical fictional scenario between a competition between two AI giants open brain in the us I wonder who open brain is based on and deep sent in China. I also wonder who Deep Scent is based on.

Are competing for the first a SI system. The first A GI, the first artificial super IND intelligence system. And what's important here, a couple of assumptions to make, which are reasonable assumptions. Whoever is first to artificial super intelligence arguably wins the future, cause you have the most intelligent, most capable model at your fingertips.

And if you wanna really think, nefariously in sci-fi here, you can potentially, do cyber attacks on your competitor that are absolutely insane. Right. And you can't even fathom what they think about. So, I won't go into the depth of the paper here. Right. But like the paper ends essentially in 2027 with the arrival of a extremely strong artificial super intelligence or a a GI, essentially that is misaligned where researchers have uncovered that this system is misaligned, And upon the arrival of this system the forecast ends with two possibilities.

Either we slow down or we accelerate. I'm gonna leave the link in the description. We'll let you read what these two scenarios look like here, But both of them are not necessarily the best scenarios. And both of them have a lot of difficulties ahead. One of them is definitely worse than the other, right?

So, but some important context here, set by the papers, is. at that point, if you have developed an A GI that is more intelligent at politics than any politician or that is more intelligent at tech than any engineer, or that is more intelligent AI than an AI researcher, then surely it will be so embedded in daily workflows, That politicians, decision makers, leaders use this AI every day. Numerous hours a day. and in that sense, like a MIS alliance system would have networked access to everyone in power and would be able to influence how things go, Because you'll be able to connect the dots between politician A, politician B, so on and so forth, And if you think AI will not be used by politicians, I would take a step back here, And think one, politicians are people they need to. They need support in managing complexity. and indeed there is some reporting, albeit not sure how credible, I'm not gonna take it at face value that some of the tariff policy may have been set with the help of church pt.

So I'll, I'll link it in the description below, but again, I'm not gonna say that this is a hundred percent real, right? So with these kind of assumptions. Discussion set here, you would think that a natural answer is to slow down and understand better how to build a better, more aligned system.

But the dynamics here really resemble a classic prisoner's dilemma. So let's say first company, company A slows down, Then your competitor will be incentivized to go faster because they will be first to artificial super intelligence, But if you refuse to slow down, then you will forgo your arrival to artificial super intelligence, So, if your competitor slows down. Then you have the incentive to keep going because you have the chance to arrive at a SI before them. So What I really enjoyed about this paper is that it really paints that prisoners dun la and overlays it over current geopolitical dimensions and current geopolitical discussions.

And it really. For the first time, maybe, maybe I'm not well read on this topic enough really created a relatively realistic picture of what the nightmare scenario could look like with ai. So highly recommend everyone reads the paper and not to be a Debbie Downer here and ruin your weekend and get a gauge of kind of what, at least a potential world that we could be headed in.

Now that said, I don't wanna be an alarmist. There are many critics of this paper. We cannot confidently say that scaling laws will continue. Even if they do, we cannot confidently say that this will lead to recursive, self-improving systems that will uncover new architectures in, in AI within days or weeks, or be able to emulate the brain and so on and so forth let alone artificial super intelligence.

Thirdly, as you mentioned early on in our discussion, there will always be a limitation on compute and energy. And fourthly, let's say we do develop a GI, there's also a lot of benefits to it. We're probably gonna be able to solve cancer or do a lot of wonderful things, So I'm definitely not a dor but it is a very interesting paper, I would say.

So that's my spiel. Richie, any, any thoughts about AI 27, when you read it?

Richie Cotton: Yeah, so I have to say,

Adel Nehme: Honestly, after reading AI 2027, I'll take that any day of the week over these scenarios.

Richie Cotton: Yeah, I mean, so, all this AI safety is actually really, really fascinating and. There are some very well-defined scenarios in what can go wrong. Like, there are researchers, you know, the smart people have been thinking about this sort of stuff for decades, and we know like specific things that we shouldn't do in order to have a really bad outcome.

So, there are obvious things like, you don't want AI that is gonna help people do cyber crime. But at the same time, also pretty much every spy agency, every cybercriminal organization is trying to find exploits and software making use of AI to do that. So it's almost certainly happening just behind the scenes.

You're just not very aware well aware of it. There are other things like using AI to find novel, viruses and things like that for bio-terrorism, it's like, well you won't have like smart reasoning AI for medical researchers to, find cures to existing diseases and, chemists to do cool things.

But also at the same time, if you are a bioterrorist, then you probably wanna use the same tools to, murder people. So there is a double-edged sword and it's gonna be some very thorny sort of edge cases in terms of like. it's gonna be subtle differences between, we have a really great future and we have a really terrible future.

The stakes are pretty high in getting this right.

Adel Nehme: Yeah, and this comes. Back to one of the episodes that we had with Ian Bremmer and Jimena River, on the need for a global AI governance framework, at least to hinder some of the potential harmful consequences of ai. But given the race dynamics that we're in today, I think we need a much more peaceful world to be able to achieve these types of frameworks and treaties.

Final thing, Richie. Any final note on AI 2020 to share with our audience today?

Richie Cotton: Yeah. Yeah. I mean, just to follow up on that last point. So, one of the recurring themes throughout this episode has been rivalry between the US and China, and over the last decade we've seen a kind of unraveling of globalization and the US China competition is the biggest thing. So having the two organizations from the two countries compete in terms of ai, it's good in.

Speeds up progress with ai and it gives different viewpoints. But at the same time, if we do end up with a situation where the United States and China really fall out, then that's gonna make existing complex like Russia, Ukraine, or Israel, versus most of the rest of the release, they're gonna be small for compared to us China.

So that's the big geopolitical worry I suppose.

Adel Nehme: Yeah, let's, let's hope that it does not reach that point. Now, Richie, anything else before we wrap up? Any plans for the weekend?

Richie Cotton: well, I was gonna just eat chocolate for Easter, but actually now I think I'm gonna go buy a bunker somewhere very remote.

Adel Nehme: Well, for those who are not aware, I have been on the proper train for a long time. So, so It's not, it's not a bad idea, but I would still put a low probability on it as of today. Okay. Cool. Richie always a great chat. Thank you so much.

Topics

Artificial Intelligence

Machine Learning

podcast

Industry Roundup #5: AI Agents Hype vs. Reality, Meta’s $15B Stake in Scale AI, and the First Fully AI-Generated NBA Ad

Richie and Martijn explore the hype and reality of AI agents in business, the McKinsey vs. Ethan Mollick debate on simple vs. complex agents, Meta's $15B stake in Scale AI and what it means for data and talent, the first fully AI-generated NBA ad, a new benchmark for deep research tools, and much more.

podcast

Industry Roundup #1: OpenAI vs Anthropic, Claude Computer Use, NotebookLM

Adel & Richie sit down to discuss the latest and greatest in data & AI. In this episode, we touch upon the brewing rivalry between OpenAI and Anthropic, discuss Claude's new computer use feature, Google's NotebookLM and how its implications for the UX/UI of AI products, and a lot more.

podcast

Industry Roundup #6: GPT-5 Launch & Scaling Limits, Meta’s Chatbot Guidelines Leak, and AI Safety Concerns

Richie and Alex explore the launch of GPT-5, scaling limits in AI, Meta’s leaked chatbot guidelines, trust in AI tools from the Stack Overflow survey, why OpenAI and Anthropic are giving models away to the US government, AI safety concerns around reasoning, and much more.

podcast

Industry Roundup #3: The Rise of Reasoning LLMs, OpenAI Operator, Project Stargate, and Gemini’s Struggle for Recognition

Adel and Richie discuss the rise of reasoning LLMs like DeepSeek R1 and the competition shaping the AI space, OpenAI’s Operator and the broader push for AI agents to control computers, and the implications of massive AI infrastructure investments like OpenAI’s Stargate project.

podcast

Inside Meta's Biggest and Best Open-Source AI Model Yet with Thomas Scialom, Co-Creator of Llama3

Adel and Thomas explore Llama 405B, it’s new features and improved performance, the challenges in training LLMs, the future of LLMs and AI, open vs closed-sources models, current research and future trends and much more.

podcast

Industry Roundup #2: AI Agents for Data Work, The Return of the Full-Stack Data Scientist and Old languages Make a Comeback

Adel & Richie sit down to discuss the latest and greatest in data & AI. In this episode, we touch upon AI agents for data work, will the full-stack data scientist make a return, old languages making a comeback, Python's increase in performance, what they're both thankful for, and much more.

See More See More