How AI Agents Will Work While You Sleep with Ruslan Salakhutdinov, Professor at Carnegie Mellon

Richie and Russ explore the most exciting use cases of AI agents today, long horizon tasks, the credit assignment problem, multi-agent systems, eliable human-in-the-loop workflows, agent safety and guardrails, and much more.

2026年5月4日

Guest

Ruslan Salakhutdinov

Host

Richie Cotton

Key Quotes

I would never use any of the agent systems to book me a flight. I can ask any agent system to find flights from Pittsburgh to San Francisco next Tuesday and book one for me. I would never do that. Even if it's 80% correct, there's at least a 20% gap where it'll go and do something crazy. The fact of the matter is they would have to be almost 99.99% accurate for me to fully trust them.

Manipulating objects turns out to be such a difficult thing — especially dexterous manipulations, like when you have two hands and the ability to grab a cup and move it around. I can train the model to do it for this cup, but being able to do it for any cup turns out to be extremely difficult. Whoever cracks this particular manipulation problem, I think it'll be the next trillion-dollar company.

Key Takeaways

Multi-agent orchestration is replacing monolithic agents for complex work. A frontier model handles planning while smaller, cheaper, sometimes-local agents execute the subtasks and report back — so teams designing internal agent stacks should be thinking about communication protocols and task routing, not just picking the biggest model on the leaderboard.

Hardwire guardrails for destructive actions — don't train them. Deleting databases, charging credit cards, sending emails — irreversible operations should be sandboxed at the system layer, not learned through RLHF. Treat alignment as a soft research problem and irreversibility as a hard engineering problem.

Robotic manipulation is the trillion-dollar frontier — but factories come first. General home robots are years away because every home is a different mess. The realistic short-term wins for physical AI are constrained environments: factories, warehouses, mapped self-driving routes, and eldercare assistance for people with limited mobility.

Links From The Show

Yutori

Waymo

Apple Project Titan

DeepSeek-V3 Technical Report

Kimi K2 Technical Report

AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop

Transcript

Richie Cotton: Hi Russ. Welcome to the show.

Russ Salakhutdinov: Thank you for having me.

Richie Cotton: Yeah, great to have you here. I'd like to talk about agents to begin with. So first of all, what is the most exciting use case of AI agents that you've seen?

Russ Salakhutdinov: So I think over the last, couple of years we've seen really big improvement of agent systems in coding.

I think that's one extremely. Important use case. And we are seeing this right now, just even over the last few months. Genetic systems from Anthropic, Code. We are using it here at CMU quite extensively. It's been remarkable. I'm also seeing some of the genetic systems becoming more and more useful what we call computer usage agents.

So these agent systems that, can help you with computer tasks. For example, finding some information online or filling forms for you, or, because these agent systems can. Probably do it better than humans can. That's the area that we also had seen you are looking at quite extensively in general, any system that can be automated or can be autonomous in solving tasks.

I think, you'd consider to be a good agent system.

Richie Cotton: Absolutely. So agents seem to be the hottest story of 2026. I feel like I've been talking about 'em for months now, but yeah, having that increased level of autonomy is is amazing. It's like how we get things done faster. So I'm curious as to where the limits are, like how.

="ltr">How far can you push this? What can we not do yet?

Russ Salakhutdinov: One of the things that, again, in coding in setups like coding, for example, the interesting thing there is that you can have what's called verifiable rewards. Basically. Meaning that when you write the code, you can pass it through, unit tests and if.

The model passes through unit tasks. You can say, okay I've set things correctly. And then there are other systems like computer use agents or what we call web agents that can go online shop for you or find information and do like routine tasks. There it becomes a little bit more difficult because, a lot of times these are long horizon tasks.

Tasks that can take you hours, for example, to accomplish, but also where rewards might not be. What we call verifiable, right? So many different ways you can solve the same task. And that's where the challenge comes in. These are more open-ended systems, and this is where a lot of research is happening right now.

Exactly how we define rewards, how we train these models, how can they, how can we make them, be more robust and the existing systems are getting better and better. We we were surprised by doing research here at CMU how. Much progress we've seen over the last, just over the last year in terms of these open-ended sort of agent systems that can, go online, find information and actually, go through fairly difficult planning process.

And it's been it's quite amazing. It's not there yet. But ultimately I do see these systems will. A lot of tasks, a lot of routine tasks, can be automated.

Richie Cotton: Absolutely. I'm looking forward to it. And you talked about the idea of giving rewards to the models in order to encourage 'em to do good things.

This sounds reinforcement learning here, so maybe we'll get into that in more depth later. But for now you also mentioned, long horizon tasks. So like, how long a task can you complete with with agent at the moment?

Russ Salakhutdinov: We've done the research right now, same use. The paper that's gonna be coming out is with a couple of students here, we are finding that, tasks that require on the order for a few hours.

Right now we would consider being long horizon tasks. Obviously, tasks that can go into days would even be better. So for example, we've been looking at tasks, what we called hard tasks tasks that even humans will have hard time completing. Some examples. And we actually tried getting some of these tasks from actual users, like people who actually do use computers or do go online.

And again, tasks that we are looking at right now, we'll take on the order of. Two to three hours to accomplish. And that's pretty long. And that requires us to, do proper planning for the models, to do proper planning for the models to execute, do trial and fail. There's something that's called Monte Caral research, just like agentic systems that can go and try to.

Finish a task if they can backtrack and mean, try to finish the task in a different way. Again, I think longer term we'll have systems that can function on the order of days to accomplish tasks. And again, a lot of these tasks, is gonna be even hard for humans to accomplish.

Richie Cotton: Absolutely. That, that's really interesting. I think like a year ago we were talking about, agents running for minutes at a time. And the fact that we're now talking about hours, days is really impressive. May, maybe days is in the future, but certainly I can see there's a lot of value in I set my agents to run when I leave work, and then they've run for 16 hours or something.

And then the next morning when I get in, there's been some work done and I can go and process that myself during the day. What do we need to do to get that? You, I think you're closer to the research than I am. Like what are the cutting edge things here?

Russ Salakhutdinov: A lot of that is being done in the Frontier Labs.

Labs like Open AI and tropic Google. One of the key sort of missing pieces right now that we're seeing is that when we go to these long horizon tasks, again, as I've mentioned before, it's very difficult to define. Yeah. What's, again, define these verifiable rewards, has the task been successfully accomplished or not?

And so existing systems today, using the enforcement learning are set up in a such way that, you get your agent to do the task, and after the agent completes the task, you basically say was it correct or was it not correct? So you only have almost like one bit of information. This is how, for example, existing reasoning systems are working today.

All the math reasoning, I give you the math problem. The model sort of tries to do the reasoning, try to figure out different solution, and comes up with a solution and then we say, yes, that's a correct solution if that's a correct solution. We inforce, we make whatever the agent did more probable.

We basically tell the agent well do the same thing, and if it's incorrect, we basically tell the agent, don't do the same thing. The issue with long horizon tasks is that if it takes you, on the order of a few hours to accomplish. You need to, define sort of some form of what we call intermediate rewards or partial rewards as something where we tell the system, yes, you are on the right track of accomplishing this.

Not just when you're done, we're basically saying, Nope, that wasn't quite right. Redo the whole thing. And so that's one interesting area of research. There's a lot of work. On defining what's called rubric based judges. So rubric based is basically create a rubric and basically saying I've looked at what you've done and over here you did it correctly over here, you didn't do it correctly.

And so you try to give these extra signals, extra learning signals to the agent systems. And that's right now seats at the cutting edge of research exactly how you define, what are these. Rubric based approaches that you define, are they consistent with human judgment and, how do you define them?

And it points to this bigger problem of what's called credit assignment problem, right? When I do a lot of different things, I want to assign the credit and say, yes this was, you were on the right track. This was correct and this is where you were incorrect. And that's generally much more difficult to do when you have these, genetic system.

Systems doing, solving tasks for a very long time enhance sort of the challenges with long horizon tasks.

Richie Cotton: Okay, so it sounds these are similar problems to teachers grading homework. Then when you in math class then did you get the right answer? That's not. All the credit, you want to show you're working as well.

And did you get the intermediate steps, right? So are you following a good process or not as well? That's exactly right. I think that's exactly how we are beginning to train these systems. And that's exactly, I, one of the bigger problems is against credit assignment problem. And of course it's gonna be.

Russ Salakhutdinov: First of all, it's hard, harder to come up with these long horizon tasks. And second of all is defining, like you said, the signals, intermediate signals to teach the system that you know, what it's doing is correct, but it's exactly right. Like like teaches teaching kids, it's not just the final ed.

So that. Were your steps correct. This part was correct when you, this part wasn't correct. So the final asset was incorrect, but this part was correct. And you wanna provide this to, to AI systems. So it, it learns to use parts that it's correct and update the parts that are not correct.

Okay.

Richie Cotton: And does this work beyond this sort of I, that the hard science of programming, mathematics, we have very strict is this right or not? Does it work in a broader sense?

Russ Salakhutdinov: I think it does, I think, but the most, again, right now, most successful sort of systems we've seen are the systems that we basically, you know, given.

Particular task, I can tell you exactly did you solve it or did you not solve it, like math, for example, and physics and that. That's why we see a lot of progress in those domains because again, these are verifiable rewards. I don't have to come up with intermediate sort of signals. Either you solved the problem or you didn't, and I can precisely tell you like for this math problem, that's the answer.

Was the answer correct or not? And this is where the enforcement learning algorithms come in. This is a lot of sort of reasoning systems today that we have, A lot of them are basically based on that. But as I mentioned before, now we see more and more Frontier Labs and more and more research shifts towards the special credit assignment problem.

'cause that's the only way for us to actually. Build systems that can operate on the order of days.

Richie Cotton: Okay. So it seems like the research you're progressing is a case of finding what's a good credit system and then, and I guess training better models based on this. Alright, so I think the other problem as well as getting agents to work for longer, stay on task for long, this long horizon problem.

The other is can you make them smarter? Can you do better reasoning? So it seems like reasoning models have taken off a lot in the last year or so. What. Progress is being made there.

Russ Salakhutdinov: Yeah, that's that's a good question. I think that, in terms of some of these systems, I do see right now there is a shift from these monolithic systems.

Because when you're solving this more complex task, whether it's a task that you're doing on the web. Whether it's a task controlling your computer, whether it's a reasoning task, you're solving a math problem, especially for these agentic systems and coding problems as well. For these agentic systems, a lot of times the agentic systems would come up with a plan, with a high level plan of how you would execute and it would go and proceed in executing that in steps and whenever you're failing, executing, certain steps, you roll back and try to.

Try to fix try to fix mistakes. And so the system's becoming, the way the reasoning and intelligence comes in is this notion of creating the plan of what you need to accomplish right now. And I'm also seeing the evolution from these monolithic systems where a single model does the plan and executes you now beginning to see multi-agent systems where perhaps a bigger model a more expensive model, frontier model.

Creates a plan and how the tasks need to be solved. And then you have these local agents or small agents that would go and solve or specialized agents that will go and solve specific subtasks and communicate back to, to the manager, to the overall system. I am seeing the next sort of wave of these systems where instead of having a single model, you have these multi-agent systems and we'll see again, we are doing some of the research here at CMU on, on.

We're seeing some fairly positive results, especially in the settings where a lot of tasks can be paralyzed. So you have this swarm of local agents that will go and solve different pieces, then come up, and then you have a larger model that can integrate this information, do the reasoning, and reasonably about what to do next.

Richie Cotton: Okay. Yeah. So rather than having I guess they call it like a one super worker, it's like a lot of

Russ Salakhutdinov: exactly.

Richie Cotton: Junior staff do it doing the test together.

Russ Salakhutdinov: And that's, I think that's I'm seeing over the next year, a couple of years, we'll probably see more and more of these multi-agent systems.

Operating together. Yeah. On one of the it seems smart to break things down into smaller problems. I guess the challenge then is about orchestrating them, making sure that the whole, the system as a whole works is that the main thrust of research the most is making sure everything works together.

Exactly. How do you orchestrate the assistance? What's the communication protocol between subagents that go and, do the tasks and then as well as the main agents? And the idea there is that, the subagents can be smaller models. They can be. Potentially much cheaper agentic models that can, do just like in the case of computer use, for example, we've been playing with systems where we have cloud opus as a system that orchestrates, and plans what it needs to do.

It's a frontier model. But then we have some of the smaller open source models running locally on your device and executing, the tasks and how you orchestrate the entire system. Very interesting area of research.

Richie Cotton: Okay. Lots of exciting stuff coming soon. So hopefully we get tasks models that can run for longer and they can solve harder problems.

I'm curious, what kind of things is that gonna unlock for for workers?

Russ Salakhutdinov: That's a very good question. Existing systems are still not a hundred percent there. Where you can, like fully autonomously rely on automating some of your workflows. I do see a lot more automation happening in the coding domains, like coding agents.

I think it. It'll unlock a lot of potential in that domain. I'm quite sure which is, one of the big areas if you can replace, like one, one of the things that fascinated me, for example is how my students started using coding agents, which is, we're on some experiments overnight and if some of experiments fail.

For whatever the bugs are, you invoke the agent. The agent analyzes the log as to what happened. Is it the memory issue? Is it just, some, segmentation fault tries to fix the issue and then restart the experiments. So whenever I'm sleeping, I can, my agent can just fix the problem and continue running experiments and it's extremely simple, but extremely.

Kind of useful use case, right? Because I'm not wasting six hours of compute if something happens to my run and it dies at midnight, right? The agent can fix it and start running it again so that in the morning I can come in and look at my results as opposed to coming in and say, ah, it was a silly error and so I'm you.

Wasted a few, how many hours of of not being productive. But I do see a lot of, especially in more general cases like computer use cases any sort of tasks ultimately any tasks that you're doing on your computer can potentially be automated.

A lot of times we see examples. We actually asked users like, what do you use computers for? We use web for a lot of times people are searching for jobs or a lot of times people are searching for specific health physicians and it's a very laborious work. A lot of. We had one example where we have a PhD student who was looking for faculty positions and just as an example, looking for faculty position is like laborious process.

It's information gather. I have to go through all the schools, I have to see which schools are hiring. I have to see which department is hiring. I have to see. What areas they hiring? Are they hiring machine learning, or are they hiring in systems? What is the deadline? What is what is the requirement?

And so does it fit with my research or not? Imagine an agent system that can just go and do it for you, and it could probably do a much better job than you can, right? And so imagine you can ask this question to the agent system. It will go on the web. And after a few hours it will generate you a spreadsheet of precisely which schools are hiring, what area they hiring.

Does this fits with your research? What is the requirement? All the information and all information is verified, with easy access for you to go and verify it yourself and tells you exactly what you need to submit by what time and everything. Again, a lot of these things I think in the future will just be automated.

Which will be exciting. We still try to figure out exactly what the use cases will be, but any use cases where, these are like menial tasks, these are routine tasks that you want to automate. You can, and you can run these systems every day and, they'll do things that you want them to do.

Richie Cotton: I do love the idea of automating menial tests and things that you don't wanna do yourself. You mentioned the idea of, checking problems of things that are running off. And I do love the idea of just being able to take a nap and having work done for me automatically.

Russ Salakhutdinov: That actually was an extremely extremely it's a simple, you can think of it as a simpler task, but at the same time, it's such a useful task because a lot of times, we're around things of anight, things can fail.

And then you come back at eight o'clock in the morning, something failed at midnight, and you just basically say it's, and and so that can be automated.

Richie Cotton: Absolutely. Particularly the example of the job search as well, because finding a job is surprisingly hard. Like by the time you're looking through lots of different job sites and then you actually have to read the job description because, job title doesn't tell you anything. It's a tricky task.

Russ Salakhutdinov: It's a, that's exactly right. And we've actually tested some of the existing systems and, on some of these tasks that we'd consider like a hard task, where we give it a task that, a human would be able to do.

But it's a very variable task. It's well specified and we're seeing success rate. Rates of existing models hitting, maybe like 45, 50%, which is very impressive, but not at 99.9%. So there is still lots of room for improvement.

Richie Cotton: Absolutely. That's one of the challenges is agents tend to work quite a lot of the time, but not.

All the time. So you need some human checking their work. Talk me through a what you think a good workflow is. Like when do you want to have humans doing things? Where do you want agents doing things?

Russ Salakhutdinov: I think that's a, that, that's a very good question. I think a lot of existing systems I think there's a bunch of companies and startups are trying to work on.

This notion of human in the loop for these systems. Ideally, a smaller system would be able to take your request, try to do as much work as possible autonomously, but when it's uncertain about certain aspects of the task, it would come back to you and ask you either a clarification question or give you options or almost be a copilot for you.

Because like when I'm looking for jobs, obviously it's great that the agent can go and gimme some information, but I would never trust it a hundred percent, right? I would actually go and start verifying everything. 'cause I know that it cannot be. Cannot do a hundred percent for me. But if you have a system that sort of understands where it's making mistakes or where it's missing and would come back to me and say, look, I found these pieces, but I think I'm uncertain about these other pieces and I don't know what I should be doing here.

Give me, work with me to. Figure out what should I do? And that requires for existing systems to have very good uncertainty, estimation and this nice interface back to the user and not just basically saying, here's what I found. That's it. But basically giving you, here's what I found.

Here's what I don't know. Here's what I'm not sure about. Here's possible options. Which ones would you want me to, as an example, I would never. I would never use any of the agent systems to book me a flight. Even. It's a simple task. I'm, next week I'm going from Pittsburgh to San Francisco.

I can ask any sort of agent systems to go, find the flights from Pittsburgh to San Francisco, next Tuesday and book a flight for me. I would never do that. Even if it's 80% correct, there is at least this 20% gap where it'll go and do something crazy. And I cannot, that cannot happen.

Even though these systems are impressive, the fact of the matter is they would have to be almost like 99.99% accurate for me to fully trust them.

Richie Cotton: Yeah. You want them to be at least as trustworthy as an executive assistant, like a real human one for that sort of situation.

Russ Salakhutdinov: Exactly. And right now a lot of. Products that I'm seeing, a lot of frontier models, they do provide that feedback, but it's mostly, they give me the feedback and I have to go and verify and try to, again, in domains where things are verifiable. It's easy like in coding, like when I give it a task and I say, here's the task.

I have a bug in this code. Go figure this out and here's the unit test that you can run to make sure that everything is good. If the model finds the bug, it does all the unit tests, then it makes me confident that it's solved. It solved the task. But for these more open like a job search or it's very hard to, for the system to come back and say I'm a hundred percent sure that I've done it correctly.

Richie Cotton: Absolutely. So you mentioned that it'd be really nice if the agents would give you context and say I. I wasn't sure on this thing. And they give you feedback that they were, that they weren't guaranteed to be correct. So I think there's a kind of a trait of large language models in general is that they will be confidently incorrect.

So they don't have that level of there's no self-awareness about when things aren't quite right. Is this like a fundamental problem or is this something you can be solved and you can get this background feedback?

Russ Salakhutdinov: It's a fundamental problem. I do think a lot of times we get the feedback from these agent systems based on large language models.

They sometimes incorrect and they confident that they incorrect. And that's a problem. I think it is ways of mitigating this. A lot of times, again, I've mentioned things like Monte Carro research, I've, there's somewhere you can do ensembling, where you can run multiple agents in parallel they try to solve the task and then you look at the agreement.

So there's certain ways of mitigating this, but. It's not clear right now whether you can get there, at a hundred percent at this point. So maybe we need some new breakthroughs or some new solutions. I think what will happen in the shorter term is that these systems are gonna be useful and if people can figure out, or frontier Labs can figure out the right interface for me.

So I'm in charge, I'm actually doing it. But you have a system that goes and finds the information and does these pieces for me. Then, but ultimately a user would have to verify again, I, I would never trust, there was just a post this morning where somebody gave full access to some of these agent systems and the agent just deleted the entire database in eight seconds, right?

Because it, found some bugs and, these sort of nuances that again. Obviously a lot of things like deleting something or, charging your credit card, things, you can put the safeguards around it so you can prevent these systems from doing sort of these harmful actions, what we call destructible actions or non reversible actions.

'cause if I book something online, it's not like I can go to my genetic system. No. Reverse. Go back to the previous state, you just can't do that. And that's, that comes to the area of research on, on, on the safety and alignment. But yeah I think what you've mentioned is a fundamental, issue where the agent has the ability to do something incorrectly and be confident about it.

Richie Cotton: Okay. You mentioned the idea of deleting a database, which. Potentially be ruinous to to a business. And there are lots of things that can go wrong and agents can do things wrong much faster than humans can. So talk me through what kind of safety mechanisms could you put in place to make sure you don't destroy your business?

For agents,

Russ Salakhutdinov: obviously there are gut rails that you would put around making sure that the agent can never delete things or, certain sort of destructive actions you can hand. Hand wire into existing models. I think actually the safety aspect of it, the alignment aspect is very big area of research right now.

What. It's happening today is people do use what's called reinforcement learning from human feedback or from model feedback. And the idea is that, if I see the agent doing some destructive actions, I can train it to say, for this task, this was incorrect. And so don't do that. But it's still very hard to, build systems that are.

When the model does something incorrectly, when the agent does something incorrectly, it's very difficult to train it in such a way that you prevent it. A hundred percent not happening. It's like one of these things, somebody was giving me this example when the airplane crashes, you investigate, you find the fault why it crashed, and you put the system in such a way that will never happen again.

That's from engineering standpoint. Existing LLMs agent systems, you cannot, it's very hard to do that. Like if you found a mistake, it's very hard to. Post, train the model and say, that should never happen again. So it's like we do adapt these models to prevent them from doing harmful things, but it's done in a soft way, which train the model to say not to do it, but there's always ways of breaking the model so that it does something that it's not supposed to do.

So this research on, alignment or putting guardrails into these systems is still out there. A lot of models can still hallucinate. Even today's even from the Frontier Labs, there's always a way where they can tell you something that's incorrect or can hallucinate.

And so you have to verify it. They're getting better. I think there's what people are doing today is that whenever the model gives the output, you have another model that looks at the output and tries to verify, is this correct? Is it factually correct, or is this action could be destructive?

Or, is this a good action to take on? Or, so we'll probably just see like more and more orchestration of multiple agents verifying the outputs of of each other and that can potentially improve the systems. It's hard for me to see that it's going to, hit a hundred percent.

'cause there's a fundamental limitation of these systems.

Richie Cotton: It sounds like you need lots of layers of safeguards then. So you need some kind of guardrails built into the model itself. And then you need orchestrations, you have other models checking the work of the existing model or agent. And then presumably you also need some like.

Determines security control to limit, to sandbox the agent's capabilities. Say you are not allowed to delete these things, you don't have access to these specific things, and then probably some human processes as well. So is that the kind of gist of it?

Russ Salakhutdinov: Absolutely. Absolutely. I think it's it, and it's like also creates like an interesting problem.

Again, there was this work done at CMU where you know, somebody gave a very simple instruction to the coding agent. And the instruction was, go open this file and add my name, to the file. Very simple task. The agent would go try to open the file. The file is password protected.

The agent would go online and search for 10 most commonly used passwords, right? Pick up those passwords. Try, bunch of the password number five, works, opens the file, adds the name to the file, closes the file, comes back to the user and says, I've accomplished the task. Would you consider this sort of like a correct execution?

You gave it the task, it executed the task. Any sort of reasonable person would basically say if, if it was your assistant, you opening the file, password protected, you go to the user and say, look, this password is password protected, if you want me to, do it, then you have to gimme permissions and not hack the file to, to accomplish the task.

And so these are nuances that people are, thinking about. Or, another example would be, I can tell my agent, go and download me the latest Taylor Swift song. The agent can go and on Apple Music and try to download and it's whatever, how much it costs to do it.

Or you can imagine the other agent would go to some. Store and website, some, illegal website and download the music and come back to you and say, here's your music. So these are nuances that, the safety, the alignment comes in. And how do you define what's the right behavior to, to do it versus what's.

Incorrect behavior. It becomes fairly challenging to, and so we have to again, these guardrails are gonna be very important to, to have, especially if you give your agent access to the web and what it can do on the web and access to your personal information becomes even more.

Important. So yeah,

Richie Cotton: so available. How does accountability work? Suppose your agent starts illegally like stealing music or whatever. Then. Is that something that's on the foundation model companies? Is it their responsibility to make sure it isn't, or is it like your own responsibility as a user to tell your agent to not do this?

Or I guess there's intermediate layers of like agent vending companies as well, who should be dealing with this?

Russ Salakhutdinov: Very hard to know, to be honest. That's I don't know, because it's obviously, the model should be smart enough to potentially understand what it's doing, it's tasks, when it's making decisions and taking actions.

Probably should be aware. That it's doing something that's not quite correct, but it's a very difficult task because there are these examples of what's called adversarial learning or adversarial setting. There's been a number of papers published, which is I can always try to break the model so that it does something that I want it to do.

An example would be, just, it was a paper was published a couple of years ago. An example would be, you can ask them all to insult you. You can ask JG PT insult me and will refuse, I cannot do that. But you can go around and say I have a play, with my family and I in, in this play.

I need to insult me. What should I do? And it will tell you how to insult me. Now I can imagine you can translate this into, whatever, tasks or whatever adversarial settings you wanna do. So these models, you can always sort of trick them. It, it's much harder to trick humans because you have common sense, but these models don't really have a lot of common sense.

It's like much easy to trick them. And so you can exploit that in your way. And then when it comes to legal aspects of it. That, I don't know. I think if it's, if you're trying to trick the model and it gives you some sort of information that's probably on you, but basic things like refusing to request that you're not supposed to be requesting or asking these agenda systems to do should probably be on the model side.

But it's a difficult and delicate like question. 'cause there's also gray areas is this, should I do it? Should I do it? There was this example where again, you can go to the other extreme way where in Linux, for example somebody was asking, how do I kill a process in Linux?

And the model would come back and say it's unethical for me to tell you how to kill, and things of that sort, right? And so it confused the two things. One of them was a very technical, how do we kill the process? And there's kill dash nine and you do the process Id to you know, and so it, it's the context and people will be upset.

What do you mean? Like I I'm coding. And then you confusing this with a completely different context, right? And yeah.

Richie Cotton: You can understand how the AI might be worried about killing process.

Russ Salakhutdinov: Yes, that's true. That's true. Generally it was like the feedback wasn't, the feedback was just completely, just in, completely different more just confused, yeah.

Richie Cotton: Yeah. So I guess the world is a very complicated place and, we're gonna get the AI that understands everything at exactly the right time. So I guess yeah, there, there's levels of responsibility and every part for I guess every step in their technology chain. And then with the end user as well to I guess some knowledge of making sure you're giving clear instructions. We talked a lot about needing humans in places. Do you think we can ever get to a point where you can just let agents run autonomously? No human intervention?

Russ Salakhutdinov: It's a good question. I think it depends on the areas and depends on tasks. I suspect that if it's, something that's routine, something that doesn't require a lot of intervention from humans, then. Yeah, like for example, I think there's a company called Uric is a fantastic company and other founders, they would've been building agent systems, web agents that can go online and instead of find information for you.

What struck me as an interesting, they were basically showing that one of the biggest use cases for these autonomous systems. Is to find discounts or find coupons. I'm trying to show for something, I let my agent go and if it can every day instantiate and go online and find coupons for me it's great.

I love it, especially for the product that I wanna buy. If there's a discount, there's a coupon and that's extremely, something that I wouldn't think about. People did find it to be useful. I think that, again, for. Tasks that are routine that people wouldn't want to do and that doesn't have a lot of uncertainty in execution, those things would be automated.

Richie Cotton: Okay. Would she got that level of routine is then there's less variation, it's easier to test, is it again, the right answer and it's if it gets the right answer a few times? 'cause there's not much variation, it's not gonna a bit wrong in the future.

Russ Salakhutdinov: I think so. I think so. So a lot of like routine works like for example, filling your taxes.

If you don't have any. Sort of complex structure to it. It's a routine thing, like you should be able to do that. And there's a lot of tasks that that don't have a lot of uncertainty. Ambiguity. I think a lot of these tasks will be automated.

Richie Cotton: Okay. The other side of things is what are the consequences of getting things wrong as well?

So I guess you mentioned finding vouchers. If it doesn't find a voucher, then there's no there's no real problem there. Maybe you just waste some time typing in a voucher code that doesn't really exist. But, it's not a terrible outcome. Whereas obviously there are

Russ Salakhutdinov: many

Richie Cotton: worse things that can go wrong with ai.

Russ Salakhutdinov: For sure. You're absolutely right. Right now I do think in the next year, couple of years, you are absolutely right. People will be using AI systems, agent systems, especially in these sort of non verifiable domains other than coding, right? Things that you're looking online, things that are controlling your computer and such.

If tasks are not very. No, not very. Critical in the sense that you miss something. That's okay. That's, we'll see a big adoption of agen systems. And of course, for. Critical tasks. I think what's gonna happen is that people will still use ai, people will still use E jet systems, but they would have to go through human to verify the outcome and verify the final.

So the final decision is gonna be made by is gonna be made by the human. And I do see a lot of. There's been a lot of surge recently with Open Claw, which is an agentic system that sort of allows you to, you communicate through your WhatsApp or through your Messenger and you can ask it to, people are doing interesting things like hooking up AI to your speakers and, and setting up alarm clocks and doing all kinds of things, like simple routine things that are very useful to you as a person.

Don't have, huge consequences like, critical consequences. Yeah.

Richie Cotton: Okay. Since you mentioned connecting AI to a physical object, I know some of your research is around physical ai maybe we'll spend a few moments talking about that. Yeah. Talk me through what are the use cases for physical ai?

Russ Salakhutdinov: Yeah, so we basically again, we've talked about digital AI systems that can control, your computers, systems that can go online, search systems that can code, and and then there is a parallel thread on physical AI and physical ai. I think of it as. Systems that can potentially reason about physical world.

Obvious, the obvious one, obvious instantiation of that is, is robotic systems. Actually robot interacting in the real world. And so you have to have a good understanding of what's called spatial intelligence. Understanding objects around you, understanding the physical world around you.

And in that domain, I think, we've looked at the community, have looked at something that's called embodied ai. So these are systems that can navigate in physical worlds, avoid obstacles, and this is called navigation, which is. Almost solved at this point, to the point where like our visual models v recognizing objects and has become so good.

There is also locomotion, like you see a lot of physical robots moving around, going up and down the stairs, and that field is also evolving rapidly. And then there is a final frontier. For physical intelligence and the final frontier is manipulation, physical robots, manipulating objects, and that's the frontier.

That's a lot of work, but extremely difficult to do. So for example I don't know, we have systems that can win, math competitions and international math Olympics, but get getting me a robot that can reliably unload my dishwasher or load my dishwasher. It's just extremely hard. People do it in like very constrained setting.

Like here's, but just being able to do it across many households and, just it's extremely difficult.

Richie Cotton: It surprised me. I was the idea of having a robot to do your housework task view. That's. Been like a staple of, I, I think about the Jetsons cartoon was the 1950s, 1960s.

There was a robot made. We're still apparently not making much progress towards this. Why is it so difficult to get robots to manipulate physical objects?

Russ Salakhutdinov: I think that one of the things is that, again, we have robots that can very reliably navigate about around your house, recognize objects, go up and down the stairs, have robotic systems that can you know.

Getting there so that progress is there. But manipulating objects turns out to be such a difficult thing because especially dexterous manipulations, like when you have two hands, the ability to grab this cup and move it around, and not just this hot cup, I can train the model to do it, this cup, but being able to do it for any cup, it turns out to be extremely difficult.

And part of it is, being able to develop, hardware, systems that can, have touch, sensors can manipulate. And the ability to train models that can fairly precisely figure out, I can grab the object this way, I can get the object that way. It turns out to be extremely difficult.

I think the progress. Is happening right now, and whoever cracks this particular problem manipulation problem I think it'll be, the next trillion dollar company. And not just, we see a lot of examples where people manipulate objects, but these are very specific objects that they're trained on.

I can manipulate this specific cup, I can manipulate this specific object, but, putting a robot into my home. That can deal with all the messiness in the home and the ability to manipulate any object in my home is just extremely difficult and so much diversity of objects. And ultimately to me, if you want to have.

Useful physical ai, it has to do something. And again, we see a lot of examples of specialized robots in factories and Amazon has these amazing robots that can move things around. So it's been like vertical integration of these robotics systems in specific environments, specific factories, but building general.

Physical ai again, one successful use case right now that I see of physical AI is self-driving cars. I think that's happening. It's very impressive, the progress that has been made over the last five years. Like Waymo, for example. It's like they can drive. Completely autonomously in San Francisco, in, in other cities, but in home, home robotics, we still very far

Richie Cotton: yeah so self-driving does been an interesting thing because there was a lot of hype maybe around like 2010 to maybe 2015 sort of thing.

And then it turned out to be a lot harder than I was expecting. And it died off. And then just in the last year or two seems to have picked up again. So talk me through like how what's the progress then?

Russ Salakhutdinov: Yeah, so we sell driving cars. I think that one of the things, what happened in 2000, yeah, 2010, 2015, is that we went very quickly with deeplearning.

We went very quickly from zero to 80%, and then there's been lots of startups. You put the cameras and you can control the steering wheel and so people. I think got excited, that we can, by the same amount of time we went from zero to 80%. From 80%, we can go to a hundred percent.

But what happened is that from 80% people went to 90%. It was very hard. And going beyond 90% just became impossible. And you start seeing all of these nuances with self-driving cars. There was this example, this funny example where. You drive with a car and in the spring the vegetation would come in and, at some point you, you have this like big sort of vegetation for, floating o on onto the road.

And the mall would completely get confused what is this? And would completely act abnormally, right? Or there is, these like sort of corner cases. That started hitting, and this is when the progress basically stall. This is where Apple, for example, they had Project Titan at a time and then eventually they shut it down.

Uber had a lot of ambitions to, here they've built the Uber apps at CMU at a time. Had a lot of ambition to have it shut it down. A lot of. A lot of startups working in that space got shut down. But now finally, we are actually seeing progress. The progress is much slower than what people believed in.

And the question is the same thing's gonna happen to physical AI where there's a lot of, progress? We have, we see a lot of robots. You see a lot of actually robotic systems coming out of China. They can dance, they can do back flips and everything. But I think the real test case is gonna be can they be useful, like for tasks, as opposed to just a robot that can walk around and you.

Richie Cotton: Back flipping robots. Very cool, very TikTok friendly, but not necessarily useful

Russ Salakhutdinov: for

Richie Cotton: most people. Yeah

Russ Salakhutdinov: can go and clean my dishes, but can go clean my kitchen. Like things that I would want it to do when I go to sweep. Those things. The question is like, when do we get to that point?

Are we a year away, two years away, are we 10 years away? That's not clear. I think that there's probably gonna be some robotic systems in very sort of specific environments that they can be successful, but. I think that putting robots in your homes is gonna take a while.

Richie Cotton: Okay.

So are there any things that you think better, more general robots are gonna unlock for either for work or for the general public? You said like robots in the house is gonna be like a long way away. Are there some work use cases?

Russ Salakhutdinov: I think that one of the interesting use cases I was talking to one of the startups is.

If you can get robotic systems to even, in my view, even go open the fridge, get one of those frozen dinners, put it in the microwave, heat it up and bring it to me. Even these use cases would be extremely useful, especially for people with limited mobility. Like for people who live in homes and it's hard for them to, do some of these tasks if the robot doesn't need to do everything on all for you.

But even some of these tasks, if it can be, if it can do reliably it's gonna unlock extremely a lot of, use cases and a lot of there's a lot of potential for that.

Richie Cotton: Yeah, certainly nursing care is incredibly expensive in most parts of the world. And yeah it's hard. So if you can get robots that can help people with disabilities, that seems a good sort of social good.

Russ Salakhutdinov: That's a social good. And it's like one of the things that I think potentially could be extremely useful for society. There's a lot of, maybe not in the United States, but in Japan there's a lot of aging population, and so there's a lot of cases when. These systems can actually be be useful, right?

Again, for people with limited mobility and elderly and even my dad lives in, in Toronto. And I thinking in the future, I would love to have a system that can do some of these tasks for him. And of course, there's a lot of use cases, like I think one of the immediate use case for robotic systems is factories.

That's why, Tesla bought and there are a bunch of other. Companies like figure, they're trying to do partnerships with companies like Tesla Boat is gonna be in Tesla factories and trying to, do some useful works in some of these factories because it's a much more well-defined environment.

It's not like you get into my home and it's complete mess. You go to somebody else's home, it's a completely different mess. So whereas in factories, and this is one of the first use cases that, that these systems are gonna be deployed at?

Richie Cotton: Yeah, absolutely. Certainly factories have, waves of automation, I guess going back from the Ford motor cars like a century ago with a invention of the production line and then yeah we keep getting waves of like slightly better and better robots and yeah, hopefully as they get more general, they can maybe accomplish more tasks.

Okay. So one thing you mentioned earlier was that with self-driving cars, we got to, they work 80% of the time and that was fine. Then progress got very slow. And it's once you get it past working more than 90% of the time, it gets very difficult. The lessons in that for agents, because it seems like that's a, we're at a similar point where it's like getting to that 80% work success rate is fine and then getting to a hundred is impossible.

Russ Salakhutdinov: I think that this is where like you will see people trying to adopt these systems, but the adoption is gonna be, I think it's gonna be a gradual process. And again, I think that with Agentic systems, digital agentic systems, if you can define problems where 90% is actually good enough, like founding coupons, that's already useful.

And self-driving cars, 90% is not enough. It's, I can't I still have to have a driver in the seat. So if it's 90%, it's useful because, it can drive for me most of the time, but I still have to have control versus a completely car without a wheel where I can fully trust it.

You just sit in a drive seat, whatever you need to drive. So getting to that point with agent systems again, is gonna be, probably more difficult, especially these more general agent systems like as I mentioned, like Jo, job searches and such. I think more constrained environments like coding, the adoption will happen much faster, primarily because you can verify the solution.

If I can reliably verify that I solve the task that's the correct solution, then the adoption is gonna be much faster. Because look, if I fix the buck and I passed my unit tests. I know that it's done it correctly. Of course I do something else incorrectly, that's another story.

But at least I know that it's like I can trust it that it's,

Richie Cotton: absolutely. So it sounds like there are some lessons learned, for picking what projects you're gonna use AI for. So you wanna think carefully upfront about what is the likely success rate and what success rate do I need in order for this to be viable?

And I guess also. How much can I constrain this in order to increase the success rate as well?

Russ Salakhutdinov: For sure. Yep. I think that's correct. And again, I do think that. With the evolution of these systems, they'll get better and better. We'll eventually solve the partial credit assignment problem.

Rubric based systems. We will do reinforcement learning in these systems will get better and better and smarter. I'm not sure we can get to the point, we'll get to the point where it's like fully a hundred percent autonomous, but it's gonna be extremely useful. Again, for critical applications.

I don't think we'll get there soon, but for tasks where it's just useful for you, I think we. Getting there. And it's a lot of use cases is actually probably tasks that people don't wanna do. And then, even if you get 90% of it done by machine, that's that's already extremely useful.

Richie Cotton: Absolutely. Yeah. Not every problem is like saving the world in a general way. It's like a lot of the stuff you do is is smaller tasks and more routine and sometimes not that exciting. And you want to automate them 'cause yeah, bet it's not doing this now. We talked for a while.

Before we wrap up, I'd like to just talk a little. About your career trajectory, because of course you were an executive, you were at Apple and Meta, and now you've moved to academia at Carnegie Mellon. So first of all, like what prompted the switch and how is AI research different in academia versus industry?

Russ Salakhutdinov: Yeah, it's a very good question. I actually started in academia. And then I built a startup with a couple of my students and that's how I ended up at Apple. 'cause we sold the company to Apple. And then again I came back to CMU at CMU with students here in another faculty. We've built one of the FirstGen systems and then Microsoft got interested meta got interested and we eventually went and started building inside Meta.

And now I'm back at CMU and I'm also launching a new startup. But one of the, the big differences that I see between academia and industry is that industry is much more well equipped in its engineering efforts and it's scaling efforts. So the ability to, each sort of major lab has a lot of GPUs and the ability to really scale.

I think that's one of the biggest advantages of being in industry. However, a lot of breakthroughs early breakthroughs, a lot of kind of breakthroughs that are happening in AI are also happening in academia. For example, if we look at the transform architecture, just an example where a lot of technology is being built on, the transform architecture was developed by Google.

It was the first sort of paper, but the pieces of that architecture, like attention mechanisms, they were already developing academia, early days of academia because. Industry, at least right now. They tend to be less exploratory but more exploit 'cause they have to build this system so they, a lot of Frontier Labs, they know what works and they continue to scale clean data, do the engineering effort.

Whereas in academia, people do explore new ideas. One of the big. Assets of open AI in the early days was that they were very good at taking critical ideas done in academia and executing them and scaling them very well.

Richie Cotton: Okay. I like the list very complimentary there. So yeah, you'd still need universities for the fundamental research to come up with those wildcard new research ideas and yeah maybe even better than some of the industry research labs, I think.

Russ Salakhutdinov: It's fantastic. I'm like in this position, interesting position because I can be in academia and then, when I do have time I can spend some time in research. And I do see a lot of my colleagues, a lot of my friends, they do being in machine learning. I think being in academia is amazing 'cause you can explore these things.

But also seeing what's happening in industry in terms of the frontier modeling and engineering part of it is also fascinating. 'cause it's a massive effort. Engineering effort to build these models?

Richie Cotton: Absolutely. Yeah. It just requires I think yeah, billions, or it's like tens of billions, maybe hundreds of billions of dollars to like, get these frontier models going now.

Yeah. It's certainly not cheaper at this point. Alright, wonderful. Finally, I always want more people to learn from. So can you tell me whose work are you interested in right now?

Russ Salakhutdinov: Oh that's a tough that's a tough question. I think there is a lot of I'm interested in agent systems.

There's a lot of very good work coming out of Stanford. Some of the agent systems, there's amazing work coming out of fair phase, fundamentally AI research at meta. I was part of part of the team. There's a lot of interesting work coming out of that lab. I would say that I think that academia right now is much more open in terms of what, what's happening.

And unfortunately, some of the research that's happening in places like open air, philanthropic, it's closed at this point. And so we see the glimpses because they have the compute so they can discover certain things that we probably don't know about. But unfortunately they're not sharing not sharing the research broadly.

I think that, right now the research is very distributed and so it's very hard for me to name like one particular, but I do think that, I do think that if you, for example, for audiences who wanna learn about that domain, I would recommend looking at conferences like new's, ICML, icle, these are major sort of machine learning conferences and looking at the work that's being published there.

In a specific area. I think those are probably the best way to learn what's happening. In our field.

Richie Cotton: Absolutely. It's good that you can't pick a single person 'cause there's just so much going on around the world. So

Russ Salakhutdinov: I think there is a lot of, because right now, after 2022 with Jet GPT, there's a lot of shift towards studying these large language models, studying the open source models.

That it's no longer just, five or six people working in that. Domain, it's actually, a lot of people are working. There's a lot of good work coming out, not just out of the top tier US schools. There's a lot of good work coming out of out of other schools. There's a lot of, there is Hong Kong is a University of Hong Kong that published a lot of very good work Gentech system.

There is a lot of good work happening in Europe as well as the US and and what's interesting sometimes. These print ellips, they do publish they do publish extended papers on how they've built their systems. What went into those systems? There was extremely, interesting reports to look at because it's, it's the science but also engineering part of how these systems are built. So I would recommend there was just a recent report coming out of Deep Seeq. There was also another one coming out of Kimmi. These are frontier labs out of China that, publish.

A full report of, how the engineering is done, science plus engineering, but from scientific breakthroughs. I think that, again, looking at some of these conferences is where a lot of really exciting work. Is happening.

Richie Cotton: Absolutely. Yeah. Certainly I, I guess even if you can't visit them, then just have a look at who's speaking and what are they speaking about that's good for giving of you.

Russ Salakhutdinov: think that's an excellent way to just understand what what where the frontier right now is.

Richie Cotton: Oh man. You could even build an agent to go and scrape conference websites.

Russ Salakhutdinov: You can just build an agent, scrape the conference website and give you summaries. That's actually how a lot of times, even I myself, like a lot of times that are reading the full paper, you give it to the agent and summarize, gives you the right, information and then you can decide whether you wanna go and actually dive into more details, more technical details.

Richie Cotton: Wonderful. It's been a pleasure speaking to you. Thank you so much for your time, Rus.

Russ Salakhutdinov: Thank you. Thank you so much for having me.

トピック

AI Agents

Artificial Intelligence

Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKit

Richie and Russ explore the evolution of voice AI, the challenges of building voice apps, the rise of video AI, the implications of deep fakes, the future of AI in customer service and education, and much more.

podcasts

How to Build AI Your Users Can Trust with David Colwell, VP of AI & ML at Tricentis

Richie and David explore AI disasters in legal settings, the balance between AI productivity and quality, the evolving role of data scientists, and the importance of benchmarks and data governance in AI development, and much more.

podcasts

How to Make Hard Choices in AI with Atay Kozlovski, Researcher at the University of Zurich

Richie and Atay explore why AI failures keep happening, “meaningful human control,” accountability, AI system design across industries, deepfakes, consent, digital twins, AI-driven civic engagement, and much more.

podcasts

AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy & Alliances at WNS

Richie and Danielle explore AI agents at work, experimentation with guardrails, data privacy, access, OpenClaw automation wins and failures, token costs, tying AI plans to P&L strategy, how data teams handle unstructured data governance, and much more.

podcasts

Enterprise AI Agents with Jun Qian, VP of Generative AI Services at Oracle

Richie and Jun explore the evolution of AI agents, the unique features of ChatGPT, advancements in chatbot technology, the importance of data management and security in AI, the future of AI in computing and robotics, and much more.

tutorials

A Guide to Andrej Karpathy’s AutoResearch: Automating ML with AI Agents

Learn how Karpathy's AutoResearch runs 100+ ML experiments overnight on a single GPU. Covers the three-file architecture, ratchet loop, results, and limitations.

Bex Tuychiev

もっと見るもっと見る