Can we Create an AI Doctor? with Aldo Faisal, Professor in AI & Neuroscience at Imperial College

Richie and Aldo explore the advancements in AI for healthcare, diagnostics and operational improvements, the Nightingale AI project, handling diverse medical data, privacy concerns, AI-assisted medical decision-making, and much more.

28 de jul. de 2025

Guest

Aldo Faisal

Professor Aldo Faisal is Chair in AI & Neuroscience at Imperial College London, with joint appointments in Bioengineering and Computing, and also holds the Chair in Digital Health at the University of Bayreuth. He is the Founding Director of the UKRI Centre for Doctoral Training in AI for Healthcare and leads the Brain & Behaviour Lab and Behaviour Analytics Lab at Imperial’s Data Science Institute. His research integrates machine learning, neuroscience, and human behaviour to develop AI technologies for healthcare. He is among the few engineers globally leading their own clinical trials, with work focused on digital biomarkers and AI-based medical interventions. Aldo serves as Associate Editor for Nature Scientific Data and PLOS Computational Biology, and has chaired major conferences like KDD, NIPS, and IEEE BSN. His work has earned multiple awards, including the $50,000 Toyota Mobility Foundation Prize, and is regularly featured in global media outlets.

Host

Richie Cotton

Key Quotes

The real big wave that's going to wash over the whole field is now what we call ambient intelligence and the whole space of AI for operational improvements. So that's not medical activity per se, but for example, transcribing the conversation between a doctor and the patient.

We already have a basic multimodal model that combines different forms of data and different time scales already. It can predict outcomes for certain patients much more accurately than reference language model derived reasoning or other forms of conventional machine learning based predictions, for example, about the mortality of a patient.

Key Takeaways

Utilize federated machine learning to train AI models on sensitive healthcare data while preserving patient privacy, enabling the development of robust AI systems without compromising data security.

Investigate the potential of ambient intelligence in healthcare, which automates operational tasks like transcribing doctor-patient conversations and updating patient records, to reduce administrative burdens and enhance clinical decision-making.

Consider the development of large health models, akin to large language models, that can process multimodal data from electronic health records, medical literature, and research data to provide comprehensive insights into patient care and treatment outcomes.

Links From The Show

Aldo’s Publications

Project: What is Your Heart Rate Telling You?

Transcript

Richie Cotton: Hi Aldo. Welcome to the show. So in the last year or so, we've seen a lot of quite ambitious efforts to replicate employees from customer service people through to software engineers, data scientists even.

So how close are we to being able to have AI doctors?

Aldo Faisal: I think it depends on what you mean by an AI doctor, right? So on the one side, if you think about the doctor from Star Trek, Voyager that autonomously treats people and has improved improvable bedside manners, I think we're quite a bit away from there.

And there's also a question whether we want that at all. But I think what we're going to see is ai te. That helps doctors and nurses and carers, and clinicians in general treat patients better, faster and more accurately. And there's a number of technologies we can talk about that are already there, and there's a whole range of things upcoming on the horizon.

Richie Cotton: Okay. Yeah, I suppose even like to keep track with the Star Trek timeline, we've still got a few hundred years to replace the Voyager or recreate the Voyager. Doctor, can you give me some examples of real things that we can do with AI for healthcare now? What can you actually, where we up to?

Aldo Faisal: So what's fairly mature now is AI and radiology. So this is, you're looking at a brain scan and detected tumor, or you look at an x-ray and look for shadows that may suggest that you have an infection or something else going on. And these things have been developed. The... See more

y've been already partially licensed and some leading institutions and hospitals have already adopted them in their workflow.

This is really what's really. Supporting already AI and diagnostics. I think the real big wave that's going to wash over the whole field is now we call Amid intelligence and the whole space of AI for operational improvements. So that's not medical activity per se, but for example, transcribing the conversation between a doctor and the patient and then not just transcribing that you can do that already with the voice capture software, but also.

Fitting information automatically into the record of the patient so that the doctors need to spend ages, writing up things afterwards, maybe already making suggestions based on treatment guidelines, what drugs should be prescribed to this patient, or what referrals should be made. So I think we're going to see a lot of this operational improvement coming.

The technology is there. It's been already licensed. First institutions are already here using these, and we're going to see, I think, this widespread and not just in, top hospitals, but all the way to your local general practitioner that you visit the home after you've had a cold or something like that.

Richie Cotton: Okay. Yeah, certainly those operational aspects, I feel like it's not the kind of the. The cool stuff, but it is incredibly important, although I'm wondering, 'cause doctors famously have like terrible handwriting. Like I, I don't know whether the the technology works to, can it read a doctor's handwriting and accurately transcribe that?

So there's

Aldo Faisal: been, there's a fun study there actually that people did on doctors recording the record of a patient. And there's been evidence that when doctors went from handwritten notes on a. Paper card to the electronic record, that they have more difficulty recalling that specific patient if they've not, if they're not looking at the picture of their handwriting that they did on the day.

So there seems to be something with memory coming into place there that's quite interesting on the neuroscience side of it, so to speak. But I think in general, what the AI cannot do is provide you with the relevant context. You're checking on a patient and it helps you memorize people much better. It also, hands of course, handover between different doctors, so you're treated by team of clinicians and experts to link things up better.

Richie Cotton: Okay. That's absolutely fascinating. And I think, yeah, in general, no one really likes writing notes about what they had a conversation about. Outsourcing this to AI seems like a very useful idea. I'd love to know about your Nightingale AI project. What does it involve? Nightingale AI

Aldo Faisal: is meant to become a.

A very large. Health model. So just like we have large language models that are trained on literally all English texts that you can find on the web, and we do want to do that. The same for electronic healthcare records and biomedical research data and medical information in general, and then telemedical literature.

And the challenge here is of course the medical data. How do you get hold of large amounts of medical data? You need to cooperate with healthcare providers. You need to work with patients in here. Because we're based in the UK with Nightingale ai. We have of course the National Healthcare System. So that's a one-stop shop that provides all healthcare in the country.

Literally, all doctors, nurses in the country work for the NHS, and it's really a support that you have from cradle to grave. That's frees the cost of at the point of care. Literally everyone is in it. And this system generates immense amounts of data on a daily basis. And there's of course a question, what are we doing with this data?

And of course, patients are concerned about their privacy and and citizens don't want their data to be abused. But the ideas that we want to create an AI that learns about medicine and in a different way from other things you may have heard about, may have heard about large language models applied to.

Medical data or medical information. And to me that's a bit like an English literature major, who knows about text and language, reading about medicine, and then answering questions about medicine. While we want to build models that are trained on multimodal data, so the current healthcare record and x-ray brain scan genetic information, some lab test results that were done, some pathology that was done on slices of, I don't know some blood test or something like that.

Can ingest all of this information, understand and combine it, and then reason out of the biology. So reason like medical doctor in some way should out of the science, about the science. So that's the goal of Nightingale and we're making some. Major steps towards, we are bringing in data donations from partners for giving us data.

We are working actively with NHS about how we can train and build such a system in a way that perverse bevers the privacy of the data. Here we're talking about what we call federate machine learning. And so there's a whole range of development that you need to do if you want to use. Large foundation model technology in a highly federated private sense.

This is something that the mainstream is not interested about because they're doing it on the freely available text data, but we are basically now developing a lot of the underpinning foundational technology for previous insecurity to be able to train this model insecure data environments within healthcare settings where we can absorb this data, but not take it with us, but just learn the knowledge about that.

What the nitrogen becomes is a platform. We're developing this as a noncommercial platform, so it's gonna be run and developed by. Academics, universities, institutions that are not for-profit in that sense. And this gives us a different access and a different trustworthiness with the patient population.

We've done surveys with patients with the thinking about that, and this is something that they find actually an acceptable mode. Rendering the data useful, but just like in, in other big projects, Linux as an example, here you can have corporates that support or sponsor work in the project without owning or controlling the project.

And that's how we are seeing Nightingale AI to develop. It becomes a platform. And then of course, once you have a platform, you don't need to access the medical data behind. Bad. You don't need to access the patient information. You just need to all access the knowledge that is condensed in it, Gale.

And so then as a company, as a corporate, as another researcher, anyone can just build on top of the platform something on. Effective for them. IE Google their own product using our platform below.

Richie Cotton: Okay. This is very cool, also sounds incredibly ambitious. And I'd like to get into some of the technical details shortly, but it sounds the big goal of this is to be able to answer questions in the way that a doctor would answer them.

Can you gimme an example of what are the sort of questions you're gonna answer and like, how is this gonna have an impact on public health?

Aldo Faisal: Okay, so I think it starts with a simple question you could answer. For example I don't know, you can upload your x-ray and you can upload your blood work and ask the system what you predict.

My next x-ray is going to look like. Say you have a chest infection and I'm prescribing you with type one of penicillin, or a type two of some other antibiotic. How will this evolve? And the system will predict how your x-ray is gonna look like. It's gonna predict how your next. Blood work may look like how you may feel in terms of, breathlessness or other things.

It'll been reason about that out of the biological understanding that it has, and then can, give you explanations in textual form and so forth. So that's one example of how in a practical, so to speak, patient doctor type interaction, you may ask questions. It could be also used by medical doctors who are basically now more interested in understanding.

What are the side effects? Say you have problem if you're in your liver and I'll give you a drug to cure something that you have in your kidney. What are the side effects given that you have this liver effects and how all do these, all these drugs interact with each other and positively, negatively?

We call that. The polypharmacy problem. And in, in reason about that, you could also be, for example, a pharma researcher wanting to know, okay, where's the best population to run my drug trial for this specific drug so that I have a good chance. It's seeing the best results possible coming out of this drug trial.

And the system would reason based on that, where and what and how to recruit and select patients into the best trial. So this is an area of. Active AI research and pharmacy, how do you recruit the ideal patients? And this system should be able to do that internally. And then public health really is when we're talking about healthcare setting where, so to speak, the government or the state or your city tries to manage health.

Just think about the pandemic, how that was managed. But you can also simply ask them, you want to do something about obesity and the challenge to obesity, you want to limit the spread of not just infectious diseases, but but other bad habits that are unhealthy for you.

So here you can now start using the AI system to make it reason about things and crucially what the system does because it reasons about effectively. How a patient is and how the patient evolves. The system has, so to speak, a digital twin of you, a healthcare digital twin of you, a description of how your body is or your physiology is doing, and so you can do that for.

Literally every patient. And if you can do that, then you can run basically forward predictions and see, okay, how do different people evolve? So for example, if you're an asthmatic patient and you know that's gonna be very hot and there's going to be high levels of of heat and dry air, then maybe.

You should not go out on the streets where there's a certain amount of pollution there. And the system can predict you, and can tell you don't do that. Every year thousands of elderly die in heat waves or they die because they're sematic in in, get breathing problems, for example. So these are ways by which you can use these technologies to do what we've done before, or what we try to do before conventionally to do it with ai.

But what we are really interested in is thinking about how can we use AI to do things that we couldn't imagine doing before and in beds where it becomes interesting.

Richie Cotton: That's very cool. It seems like there are so many use cases for this. Once you've got this model in place, actually, do you envisage it mostly being used by doctors or or public health officials, or is it gonna be something that could be used by anyone to ask questions about their own health situation?

Aldo Faisal: Yeah. That's great. I think the challenge is. At the start, it's going to be a research tool. So it's going to be used by AI researchers, digital health researchers, and clinical researchers that will interact with it. It's because the user are focused on the user experience or user interface. It's, and getting the intelligence in, I think as the system evolves you're going to get a proper chat and a proper prompt and then you can start conversing with the system.

Now again, I would say the first steps would be to have this for. Or professionals because the system, doing a proper alignment on the system that it doesn't say wrong or crazy things that have nothing to do with medicine, but simply are in the language model of it, is a non-trivial problem.

You want to have some control over what the system speaks in general, if you laid it out to the general public, right? Because if it's seen as an authority, you need to be very mindful about what type of authority. So I think. At the start, we can show that it works and transforms things differently from what you would do with conventional ai.

I think it's gonna be for scientists and researchers. Then it's gonna be for trained professionals, and then hopefully by then we will have a scale and interest in the project. That will allow us to then also develop a proper interface for that. But maybe that's a product that somebody wants to develop.

On top of our system.

Richie Cotton: Okay. For professionals, first of all, and once you've iron out the kinks, that's when you can gradually roll it out to everyone else. I'd love to know a bit more about this healthcare foundation model. So could you talk me through how does this differ from a large language model?

Aldo Faisal: A large language model is typically trained on sequences of text. So the quick brown fox dumped over the lazy dog, right? That's these words. And what a language one learns is then quick brown fox, and then. It tries to predict or speak the next word, jumped or ran, or whatever other words there is in the English language.

And, but just by reading a lot of text, this model can then very accurately predict what the next word is and can fill it in. And so this next word prediction is what basically drives this technology and amazing thing about English language or language in general, is that is enough for the system to complete sentences.

Actually right answer. So you start with a question and the system starts writing the text. That's part of the answer. And so we do this through what we're calling transformer models that are models that can capture. Very long sequences and reason about them in an efficient way. So that's a language model.

Now, a healthcare model is based not on the description of text or language in general. It's based, for example, on your physiological information, what's your blood pressure, what's your heart rate in this second minute, hour, how much oxygen is in your blood and so forth. And all this information is collected.

Say when you're in the hospital we know this from the movies when people are plugged in and there's this beep. All this is data that's flowing out of you that describing how you're doing, but it couldn't be as well. Other information. Say you went to your general practitioner and he took your weight and he did a blood test and measured your.

Blood glucose level to see whether VE and so forth. This is all information that describes you, but it describes you different moments in time. And so now instead of basically reasoning over a poem or an essay that you find on the internet, you have now information about you that appears in different moments in time.

So it's again, sequence of information, but it's clumped together. Say we're in your hospital, you had an x-ray and you had other stuff. All this information comes together. Then maybe, it takes you six months till you see a doctor again, and he will take your weight and blood pressure and add a bit more information.

It's different in the format, and that's not a continuous stream of data. And so we need to build and think around the foundation model aspects the core technology do the computer science around it to be able to deal with this vast mismatches. The timescale of information, some information is updated every second.

Say your heart rate and some information is updated every few months because you've done a blood test every few months. And so this we need to bring together. And what you then get is a model that can then not predict the next word, that can then can predict your next medical information. So I can say, how's your next x-ray going to look like?

How's your next blood pressure going to look like? And so forth. And the critical thing is this thing can also reason if I do something to you, because that information is also on the record. So I give you a drug, how does that drug affect you? I can see that from other people. I can then predict how that drug will affect you, and so how things should improve or not improve and so forth.

Richie Cotton: My sort of understanding of large language models and apologies to any AI engineers listening, is that basically you're scraping the whole internet, cleaning up the text a bit, throwing it into a big mural network, and then magic happens. Maybe you tune it a bit at the end. So in this case you've got very heterogeneous data types.

You mentioned the idea of like x-rays of numeric data about people like their body weights and whatever. You got time series data, it's a lot more complex. So is it the same kind of strategy, can you just bundle it all into a big normal network and magic happens, or is there something more complex there?

Aldo Faisal: I think you've got exactly right what the challenges are. And that's actually where we're now going from something useful to some basic questions about how do we develop the machine learning that enables that? And these are open, active questions. How do we deal with massively multimodal data say images, text, genetics, and blood work?

How do you combine them? And the challenge in CR that. In naive approaches, you build a model for each one modality. At the end, each one gives you an answer, and then you just combine the answers. We call that late fusion for bringing together and late because you're doing it at the end. And that's not very smart because of course, say a doctor will look at your x-ray and hear you're coughing and then ddu that you may have a chest infection in that moment, not six months after he's treated you with antibiotic.

And so we need to build AI systems that can ingest this information at the same time. And we're looking here at a number of architectures. So literally how to design the neural network communication when what's combined so that the system can reason across these modalities. And there are a number of competing approaches, and it's not entirely clear to us what the best way forward is because.

People have worked on vision to text transformers, for example. So you show a video and it tells you what happens, or you write text and it generates your video. We know how to do that. But what happens when this information is not in one chunk, changes in the year's time? How do we deal with the sparsity of the data?

These are the two big science questions that were basically computer science questions that we're answering by developing. I.

Richie Cotton: Okay. Yeah. So it sounds like there's still very much a lot of open research questions in order to get a model that performs well. Sounds a lot of it is about just capturing the knowledge there.

Can you talk about I know you've got techniques like a retrieval, augmented generation, very popular, if they're just retrieving facts. Is that a component within your model? Absolute nothing.

Aldo Faisal: Yeah. So here's a funky thing, right? When you're an AI researcher. You are at the moment completely data-driven.

So ignore all of human history and you just take the data and learn it from scratch. But of course we now have, medical research is a bit older than computer science and there's knowledge that we have accumulated in books and articles and papers. And so the idea is that we are going to use a standard language model to read all scientific literature ever published that we can get hold of.

And basically learn something that we call a knowledge graph, so some representation that links together how different entities stay in organ. As a failure because some cells are not functioning because some chemical process in the cell is not working because some environmental agent has on that. And so you can build an entire ontologies, like a graph, a network of how everything is linked up with everything by reading the literature.

And so you can then use that with rack. So the. To combine the two, what the transformer of health says together with the knowledge. Information from human knowledge. But the trick is, of course, human knowledge in academic papers, academic are not free from contradiction. You may find two papers that say two opposite things.

So that's why I'm very careful about calling this a graph because there may be two links going in different directions linking two entities together. Is the gene responsible for this disease or is the gene not responsible for that disease? So we need to think about how to model this relationships that human knowledge is not precise, it's not logical at every single step, but it's many millions of people coming together.

Millions of study being brought together. To define knowledge and in cancer, for example, there are areas of cancer research. They're publishing 1000 research papers a day, a week, sorry, a week, but it's accelerating. So it's based be a day, no single cancer res. Cancer doctor can keep on track on that, right.

Just reading all this information. And so the idea is that with this knowledge graph we can extract information. And the idea in science is of course, if something is systematically reproducible, you will see it come out over and over again. And with that we can get some certainty in this knowledge graph.

Richie Cotton: Okay. Yeah, certainly having access to the entire sort of corpus of knowledge, but yeah, no one can read hundred to papers a day. That does seem very useful. Okay. From there, how do you deal with these contradictions and have you made any progress in working out? What happens when we don't really know the answer?

It seems like a lot of the cases you have these generative models that just will tell you, oh, I believe this and they're essentially hallucinating. So what happens if you don't.

Aldo Faisal: So that's a great question. To be honest, we started training Nightingale on the National AI Factory of the United Kingdom.

That's called is Bard AI eight weeks ago. So we have already a basic model that is multimodal. It combines different forms of data and different timescales already, and it shows that it can predict outcomes for certain patients much more accurately than reference. Language model derived reasoning or other forms of, conventional machine learning based predictions, for example, about the mortality of a patient.

That's the good news. I think the challenge that we're seeing is once you're starting to do the rag, you really need to be very systematic about that. So what we're looking here now is to find the suitable language model. To read all medical papers. So we tried that with a simple model that we can literally run on our local computer cluster.

And it's not very good at extracting medical relationships out of standard processing. Now if you use a very sophisticated model, it can do that, but it becomes very expensive and we're just an academic endeavor. So we're looking here actively for support and sponsorship. So again, this is just providing the compute so we can run.

A large language model that's sophisticated enough to reason about medical papers and extract the knowledge. The first steps were okay, but it's clear that it's not so easy, and that's ultimately because scientific papers are written a specific way. And you need to know how to read them. And this is something that really, when you do a PhD, you learn over a number of years how to pick the tricky bits out that they were carefully left out or basically bits hidden behind the curtain.

And that is not easy to convey to a language model per se. It's not a simple problem. So we're actively working on that.

Richie Cotton: Okay. Yeah, certainly a lot of academic papers. They're not easy to read for humans, like even the target audience. Yeah, I can see how it's difficult to pull out the right information for scale.

If you make predictions about people's the safety of their health, if you get it wrong, that's got some pretty serious consequences. So have you thought about how you're gonna handle what happens when the AI makes mistakes? So that's a

Aldo Faisal: huge problem, right? And I think what we need to be fair here is to ask what is the human chance of mistake?

If you look at a whole range of diagnostic decisions, do you have the disease? Yes or no? It's a fairly easy thing to say. You can look at a whole range of fields and medical agnostics, so for example, or sepsis, which is in infectious disease. The biggest killer in hospitals, actually, the doctor's decisions is two out of three, right?

So every third patient is, they get wrong on, on first attempt, so to speak. For dementia, it sits around nine out of 10, so one in 10 you get wrong. The diagnosis, we have AI system that reach or beat this value in dermatology diagnosis of a skin cancers by human doctors around 98.5%. And the best AI system now achieves around 99.5 for simple accuracy.

Syst machine systems can be better, but they cannot be perfect. And I think the realization is, if you're talking say about self-driving cars, we're expecting self-driving cars to drive thousand times more miles before having an accident than a human driver would. But human drivers don't have. An accident every couple of miles, they drive a lifetime and maybe have one or two accidents.

So that's a very high bar to breach now in medical decision making, if you're getting two outta three right, and one outta three wrong. Already going from one outta three wrong to one outta five wrong, or one out of eight wrong, would be huge impact on patients. So therefore, if the eye system can be better and we need to very systematic evaluate them, they can be huge bonus in picking up things.

And it's not just that, that, doctors make mistakes, human makes mistakes, but it also that as humans, we are focused on specific things. So say you're going in for, did I have a broken rib? You have an x-ray, and they look at the rib and say, no, the rib is fine. But actually, if they would've paid attention to the shadow in the top right of your lung, they may have spotted the tumor.

And that's a personal case that I know about where they didn't see because they didn't pay attention to it, they looked at the bones, not at the shadows. And then much later, the person was diagnosed with lung cancer. And the AI was a colleague who had access to the system, put it in and says, the AI said yes, there's a tumor there, but the human doctor didn't see it.

Right. 'cause they were focused on some things, other things. So it's here where the AI system can really boost the accuracy of the human decision making. And so AI can get things wrong and it can get very dangerous if it gets it wrong, especially if it's built badly, if it's trained on bad data, if it's trained on too little data and if it's trained on bias data.

And so we need to be very careful when we're designing these systems. We know that these problems exist. It's not a new discovery. And we need to make that extra effort to develop what we call patient ready ai. AI that you can really, let's. Out of the choral and work on patients. And that also requires that you understand how you work with how doctors work with ai.

Richie Cotton: Okay. Is the intention then that the AI is gonna be, provide information to the doctor and then the doctor makes the decision based on that rather than just AI says this and to override you don't need to bother with the doctor in the first place.

Aldo Faisal: So I think it depends. So for Nightingale, definitely we will start with, it's a decision support system.

It's information system. It's a what if, what happens if I do this or this with the patient? How will this affect them? But we have already the world's first fully autonomous medical device in skin cancer detection that can fully autonomously. Make a skin cancer diagnosis and effectively you can refer people without them having to see another patient.

That's the 99.5% accuracy that I mentioned. This has been already accepted by the regulators. Interesting. In Europe, the US has not yet allowed su the system to be licensed, but they're coming soon. So in very specialized applications where people have a very narrow focus and therefore, can watch for the quality in a very easy way.

I think we will see more and more of these autonomous decision making steps because they're just faster and often cheaper and maybe you can do them even at home. And so we are going to see more of that.

Richie Cotton: Okay. That's fascinating. And it's very cool that you've got a completely autonomous system just for skin cancer case.

I'm curious, it seems like the performance from one disease to another is very different. So do you need different AI systems for individual health conditions or is it better to have a broad platform like it sounds nightingale's pretty broad in this case.

Aldo Faisal: So that was part of the reason by Nightingale, you can spend a whole career and I've been doing this medical AI for now.

Yeah. 12 years now. 15 years. So over a decade. And that's what you effectively do. You build a specific system for each disease and you get faster or you get better. But ultimately the, there seems to be some common patterns. And of course, two diseases that are not related to each other are related to each other.

I give you an example. So stroke deaths. A cardiovascular disease because blood's not coming to parts of your brain, but it's also neurological disease because your brain is damaged. And you can look at it from the perspective of a neurologist or from a perspective of a cardiologist, but actually these are just silos of human thinking where you're schooled in a specific way of thinking about a problem.

And once you bring an AI system, it's not. Encumbered by these human limitations. It can just look at the patient as a whole. So it's really as a taric in some ways we can talk about holistic healthcare by using AI systems that can think across boundaries. And I'm not just talking fields of medicine.

We can think about, environmental impact on people's social impact and people. And so of course about how we can. Really think about treating and helping people, but most importantly, preventing the people get sick in the first place.

Richie Cotton: Absolutely. It certainly has a pretty incredible potential.

And it sounds like in this case having something broad, it means, at least in theory, the whole is greater than the, some of the parts then because it can reason about multiple different things from different angles. I'd love to talk a bit about the data. We've had a few guests on the show talking about trying to work with, electronic healthcare records and talking about having to do like really incredible data engineering and using AI to clean things up. But those those all the guests from working in on US healthcare records where I think the situation a lot more fragmented. So you talked about the NHS patient data being the main source of data.

Can you just talk me through like how you go about collecting all that, how you go about processing it in order to make it ready for ai?

Aldo Faisal: So the being a national healthcare system. It's as much a tool for medical benefit is also for performance optimization of the system. And so these systems are fairly up to date. So the way this information is collected it's literally the. The routine operational data that you have in your IT systems that gets stored on log, basically messages that are being passed around or being stored in a database.

What's then being done is that you typically anonymize this database. It's going on what we've done in Britain for a while now in different parts of the country. So also in, in London where my institution is built is based at we have built what we call secure data environments. Follow very rigorous standards for data privacy that take this patient data, be it from the hospitals, but also being it from the GP practices.

From the pharmacies, because everything is an integrated system and you can link this data up first, and then you anonymize it and then you make it available in this research setting. And that's a very exciting thing because it means I can see. After, say you had surgery, what's happening with your disease?

What's happening with you 10 years afterwards, something that typical surgeon in a US hospital would not see because patient comes and they go and that's it. Goodbye. And you don't have a follow up, but you can also look at what happened to the patient before they came to the hospital. Right? So we just ran a big project on.

Aldo Faisal: Is trained on 1.1 million patients, and learns to predict from this data whether a patient needs to go to hospital unexpectedly in the next three months. It turns out we can do that with over 80% accuracy. And what surprised us even more is that we said we want to make the problem harder. So if I getted, if I get your blood work every week, probably you can be quite accurate with what's happening with you.

But that's very expensive. But. So what we said is we wanted to just look at administrative data. So just what happens if I have admin data about you, but not actual medical data that needs to be collected by a medical professional. Turns out with this medical data, which achieve 81% of accuracy of four out of five predictions correctly.

And so that was surprising. So this dis shows you once you have this data, what you can do. And so the challenges that we faced when I spoke to colleagues in the us this was the chief medical officer of a major US healthcare system is they say that, you, we have patient. They're on the world's best cancer treatment.

They get so sick they can't work anymore. They lose their health insurance and we don't know if they survived or not. So that's very tricky to think about building an AI for cancer treatment if you dunno whether the patient survived or not. And aside from the human drama that's behind there, this is one of the advantages of this integrated care systems that we have in Europe generally, but also the uk specifically, that it gives us this cradle to grave data trajectory for patients.

And so with that, you can do very systematic studies from people, not just all one specific thing.

Richie Cotton: Okay. It just seemed like a pretty incredible data set, and that's cool that you got 80% accuracy on predicting whether this one's gonna unexpectedly go to hospital in the next three months. I feel like I would like to know that about myself.

I'm about to need a trip to the doctors. I think data privacy is gonna be, a big issue here, certainly healthcare data is considered incredibly private. There are lots of laws around it. How does it work? Making sure that if people will start passing their healthcare data to the ai, it remains private.

Super

Aldo Faisal: important. So I think there are two things that I want to say. First, I've been working with patients for almost two decades now. In no case that a patient who was suffering from a disease that I could help, that I was working on, of helping with machine learning and ai, they were not concerned about the data.

They wanted to give us the data. People who are very concerned about data are often healthy people. And so in a mean way I could say. There's an aism of the healthy not to want to give their data, but there's a huge generosity of the of the sick because they want to get better. So we need to talk about that, but we also need to talk about how healthcare data can be abused by insurers, for example, that they hush pump up your premiums or by employers that they may not employ you.

If you're likely to. ISH candidate, for example. And that's of course a no go. And so there, different countries have different laws in protecting you, not just your privacy of your data, but how it can be used, say by insurers or employees, employers for your recruitment. When we are talking about the protection of information, we need to see how we can build systems.

So the typical model now a lot of the European data sits around is that the sits in what we call a secure data environment. So this is literally a remote desktop environment where you can remotely log in, you can work remotely, you cannot download anything. You can run your code there and you can see on a screen with limited resolution plot results.

And if you want to, then to get the actual plot back to you, it needs to be exported in this human reviewers of the data who look at that and say, okay, you can take this out. You cannot take this out because you can re-identify patients, for example, from it and so forth. So that's. That's a typical mode of work.

Now we have technologies, privacy, preserving AI technologies, federate machine learning. So that's a bit like a model going on an apprenticeship journey you send them to secure data environment. It trains locally, it learns stuff, and then it leaves that it comes back, and then you send it to somewhere else and it learns stuff.

So all that you're integrating is effectively the knowledge accumulated in the parameters of the neural network. Now, of course you can set up in your network to effectively just memorize the data and regurgit it again, that, and so there are a number of tests you can do data leakage tests that you can run, so to speak, from a computer science side to, to evaluate how much.

Data leakage can exist and a fantastic colleague of mine just ran a study to, to look for what books have been read by certain language models. And he used that leakage neurologist to identify which books should have not been read by the AI based on copyright. So you can do these assessments now, it depends very much on the policy and the governance of security data environment, whether these algorithmic satisfactions are enough and in quite a few of them are not satisfied with that.

Right. So they will not allow parameters of the neural network to be exploited. And that's a challenge. So one thing that you can do is you build a model that is a bit superficial, right? You, that doesn't learn a lot about the disease, but just a bit. And you build loads of these models and everyone learns superficially about something without giving out too much inform, without learning too much about a patient, so to speak.

The idea is then that you have to speak, wisdom of the crowds, and you bring these many thousand little models together. And the hope is that is sufficient to build a smarter model. So that's one strategy by which you can overcome this federation. Silence that you need to export the parameters of the neural networks.

Another approach, and that's what we're pursuing for 19, is what we call building a meta secure data environment. So a secure data environment of secure data environments. So that basically allows these things to connect and communicate without human review because you have some mathematical guarantees on what information can be exported and not.

And so that's another approach, but it's not easy and it's okay that it's not easy, but I think the important thing is if you care about it, you can find a way to make it work.

Richie Cotton: Okay. That was good. Yeah, certainly you touched on a lot of the ethical issues there. Like I'm happy for my medical data to be used for research, but I don't want anyone to be able to.

Type in to the ai, oh, what's Richie's health situation and have it spit out like all my past history of things. And certainly I wouldn't want insurers or employees or just as strangers being able to know things about my health situation. That idea of grouping lots of models together to having your wisdom of crowd situation.

It sounds a little bit like the idea behind random forests where you've got lots of weaker models joining together into something stronger. Yeah it's swarm intelligence basically. Yeah. Okay, cool. Alright I know it's early stages for your project, like you just training the model at the moment.

There's still a lot of research questions. Have you had any success stories yet? What are the big wins?

Aldo Faisal: I think the big wins is that we've. We basically, again, we've been literally working at it for three months since we launched it and and we've been training it for two months now.

But I think the real success stories are that we've inspired others to contribute. So we have, I think it's been already officially announced. So the Children's Hospital of Orange County and Grady Health, so that's Southern Californian Healthcare Providers have provided us with data to support our AI activities at Imperial.

And they want to actively contribute their insight in medical data to, to build Nightingale. And so as we're going along, we are building more and more recruitment. We're ramping up the funding again. We're not corporates and we don't want to be for this purpose of getting the patient buy-in.

So we've joined a 29 million fund research grant from the European Union that is allows us now to work together with partners in Switzerland and Spain and Germany and France towards multimodal ai. And so I scale is part of that. We have launched in the uk something that we call the generative AI hub.

So that's a, it's a multi-institution center for working on generative ai. And as part of that hub we have now integrated nighting AI and activities of there. So it's basically, we're in the ramp up in buildup phase. I think results are going to come. I. First results look good. It's not published. So I'm keeping a bit behind there.

But it, we are getting the results that we don't see with the benchmark models because the reasoning is simply better. And that's very I think comforting. Yeah. But I think the key thing is that we're opening up avenues of ingesting more data and more people want to be part of this. Endeavor to support what we can do for people.

Richie Cotton: Absolutely. That's very cool that even three months into the project you're seeing some results and beating some of these benchmark models actually. So since you mentioned money, like I know a lot of the the big AI foundation model companies, they're building like multi-billion dollar data centers and they're incredibly well funded.

So how do you go about competing with that for. Tens of millions of of pounds.

Aldo Faisal: So we can't, right. But so let's put it that way. I don't think these companies will get access to the data in the same way we can get access to the data. So that's our advantage. I think we have also slightly different motivation behind that.

I think the key thing is that we're going to see a whole range of academic research. AI factories coming up is about AI in Bristol is the one big one. Half billion investment close to 6,000 GPUs being already deployed there. There's new investments announced in the UK of building larger AI factories in the eu, so Europe, Germany, France, and so forth.

We're seeing a whole range of AI factories coming online specifically to supporting academic research. I have to say as excited I am about technology and I have also spin outs in that sense. So I'm very much for a corporate activity in that space. I think there's this neutral ground advantage of doing things in academia and with academia that especially on very sensitive data, gives it very different buy-in with patients and doctors for example.

Richie Cotton: Absolutely. I think just the huge compute, resource requirements have mean, I think there's been a big shift from research happening in universities to research happening in companies like it's been a tremendous change over the last decade or so. So it's nice to see that there are some still, like big academic projects going on in, in ai.

Just what are the timelines for things being released, things happening when can we see Nightingale? I going live?

Aldo Faisal: So I think nice ai. So we want definitely a first proper paper with model comparison and so forth coming out in the next 12 months. I think we want to have also version that that expert users can use in around 18 months.

Probably there's going to be intermediate versions before then that we're making available slowly to the community. Again, because it's an AI system and you can. Use, not use it. We want to release it carefully. So you will not be able to download it as such in, in that way. But we want to make it accessible to clinical users.

It's a bit with, when you develop a new drug, you can't just buy the drug over the counter. But you need to make it available to the right people so they can evaluate it and play with it and improve it, of course, or build something on top.

Richie Cotton: Okay 18 months for the sort of initial release. Looking forward to that.

Alright. So finally just to wrap up I always want recommendations for people to follow new work to look into. So whose work are you most excited about at the moment? Looking at,

Aldo Faisal: I think there's a whole range of amazing people and if I name check one, then the others will be unhappy that I don't name check them.

I think there been a number of very interesting developments. One, there's been a fantastic paper by apple ai. So Apple publishers. As well. It's not all secretive and they had a wonderful paper, but the limiting reasoning capabilities of language models. Another great place to look at for stuff in that space is the AI Safety Institute that was set up by the UK government and where we are fantastic colleagues looking at exactly the limitations of AI systems in that space.

I think where the exciting things are happening is around multimodality and here I would just recommend Googling on. On archives, the, where all the newspapers end up, what's happening there. But I think what I'm really excited about is efforts about thinking about how to assess ai. And I think the trend is going there, that we're thinking about AI system more being assessed like humans, not a screwdriver for safety and operational.

And so there's a whole slew of work. We want to build AI that helps people not replaces them. And so we need to think about how they can safely work with us.

Richie Cotton: Absolutely. All very important topics there. AI safety, certainly multimodal ai, especially important if you wanna look at x-rays and other medical data.

Yeah. A tech description from Xray probably not so good. And yeah I certainly agree the interaction between humans and AI is gonna be just like a hot topic for the foreseeable future, I think. Alright, wonderful. Thank you so much for your time, Aldo.

Aldo Faisal: Richie, all the best. Thank you very much.

Tópicos

Artificial Intelligence

Large Language Models

Relacionado

blog

AI in Healthcare: Enhancing Diagnostics, Personalizing Treatment, and Streamlining Operations

Learn how AI is influencing the future of healthcare and how businesses can stay afloat with new AI skills and technologies.

Amberle McKee

14 min

podcast

AI in Healthcare, an Insider's Account

Arnaub Chatterjee, a Senior Expert and Associate Partner in the Pharmaceutical and Medical Products group at McKinsey & Company, discusses cutting through the hype about artificial intelligence (AI) and machine learning (ML) in healthcare.

podcast

Unlocking Humanity in the Age of AI with Faisal Hoque, Founder and CEO of SHADOKA

Richie and Faisal explore the philosophical implications of AI on humanity, the concept of AI as a partner, the potential societal impacts of AI-driven unemployment, the importance of critical thinking and personal responsibility in the AI era, and much more.

podcast

Developments in Speech AI with Alon Peleg & Gill Hetz, COO and VP of Research at aiOla

Richie, Alon, and Gill explore speech AI, its components like ASR, NLU, and TTS, real-world applications, challenges like accents and background noise, and the future of voice interfaces in technology, and much more.

podcast

Using AI to Improve Data Quality in Healthcare

In this episode, we speak with Nate Fox, CTO and Co-Founder at Ribbon Health, and Sunna Jo, resident data science at Ribbon Health on how AI is improving data quality in healthcare.

podcast

Why Getting AI Ethics Right Really Matters with Christopher DiCarlo, Professor at University of Toronto, Senior Researcher and Ethicist at Convergence Analysis

Richie and Chris explore the existential risks of powerful AI, ethical considerations in AI development, the importance of public awareness and involvement, the role of international regulation, and much more.

Ver mais Ver mais