Official Blog
artificial intelligence
+3

AI in Healthcare: What the Slope of Enlightenment Will Look Like (Transcript)

Dr. Hugo Bowne-Anderson speaks with Arnaub Chatterjee, Senior Vice President at Medidata Solutions, about how AI has really impacted the healthcare space, and what the future looks like.


A few weeks ago, Dr. Hugo Bowne-Anderson, data scientist and educator at DataCamp, discussed AI in the healthcare industry with Arnaub Chatterjee, Senior Vice President at Medidata Solutions, previously Associate Partner in the pharma analytics at McKinsey & Company and Director of Data Science at Merck. Read on for the transcript of their discussion, or view the webinar on-demand.

Introduction

Hugo: Hi everyone. I'm Hugo, a data scientist and data educator at DataCamp. I'd like to introduce you to Arnaub Chatterjee, who's the Senior Vice President of Product at Medidata Solutions. Medidata is a global provider of cloud-based and analytics solutions in life sciences. In addition, Arnaub is on the faculties of Harvard Medical School and Cornell University, where he teaches courses on healthcare policy and big data, respectively.

Arnaub is going to tell us a bit more about his background in a minute. But, I just wanted to welcome you all to this webinar, which is about AI in healthcare and essentially what the slope of enlightenment will look like. And, I just wanted to motivate this by showing you the Gartner Hype Cycle.


Source: Wikipedia

The Gartner Hype Cycle

Hugo: This is the Gartner Hype Cycle, where we have time on the X axis, and expectations are on the Y axis. So, as it pertains to AI, you have some innovation trigger, and then you get a lot of expectations building up around it. You get some peak, where the expectations are inflated, and we don't see as much value being delivered as we would have initially thought. Then, we move into the trough of disillusionment. And only after that do we get to the slope of enlightenment, and the plateau of productivity.

We've seen a lot of buzz around AI across many verticals, including in healthcare. I've been chatting on and off with Arnaub about this type of stuff for quite a time now, and he has a huge amount of experience thinking about this. So, that's why I invited Arnaub today to come and talk with us: about what the slope of enlightenment will look like, and try to think about where we act Hype Cycle.

Without further ado, let's get into the conversation. Arnaub, before we get into the Gartner Hype Cycle and where AI in healthcare sits, I thought maybe you could start by just telling us a bit how you got into data science originally, because everyone has a different genesis story or origin story, right?

Arnaub’s career in healthcare AI

Arnaub: Yeah, definitely. And I think I could plot my involvement in data science on that hype cycle. Actually, now that I think about it, it's serendipity in a lot of ways. I'm trained with a background in healthcare policy, and in the healthcare business. And most of my background experience has been working in the consulting space—I’ve spent a long time working in pharma on the data science side, in the health economics and outcomes research space.

And really, kind of the genesis of where I even started to notice data science was, you know, working in the federal government, of all places. Being able to interact with one of the people I consider to be the Yoda of health tech, with Todd Park, with Brian Sivak, with others who really started to build sort of a lens around how do we even think about good datasets that were emerging out of the federal government that could be leveraged by folks, like the startup community, like other companies that really wanted to figure out how we even begin to build good models.

So, that's really the inception of the career. And then, I went over to Merck, and we built out some really great teams that were focused on leveraging different types of healthcare data to inform traditional models. So, the way you think about the clinical efficacy of a drug—how do you leverage something like a medical record? Things that were pretty nascent at the time, and starting to really get a lot of uptake within the pharmaceutical community. Over time, I went and ended up at McKinsey, led a lot of work on their pharma data analytics group and spent about 50% of my time working with pharma and biotech on things like, what's my AI strategy, and how do I even build an AI team, and what do I do with all the data that we're sitting on?

And I conversely spent the other 50% of my time working with the three big tech companies out West, who wanted to understand what is my AI strategy in healthcare. And, you know, they're now kind of nested deeply within a lot of the traditional healthcare players. We'll talk about that in this conversation today. And that actually brought me over to Medidata—connecting with them, one of the partners I used to work with at McKinsey. And he built out a whole team around leveraging what we think are sort of the crown jewels of data, the clinical trial assets that we sit on within Medidata.

I think it won't be a shock to anybody on this call, but having a curated, robust data source is really the foundation for any sort of data science activity, and anything called AI, if we're going to be audacious enough to use that term, and in terms of how we think about data science. Right now I oversee our data science products within Medidata, and I can talk more about that.

But for me, the intersection of data science, and tech, and healthcare and pharma is really the product of what DJ Patil said: that data science will be the sexiest profession of the 21st century. You know, I'm not a true data scientist, but I am now immersed within these fields. And it's been a really fun ride. That's kind of how I ended up here.

Medidata and Acorn AI

Hugo: Awesome. So, can you just tell us a bit more about Medidata and Acorn AI?

Arnaub: Medidata is a 20-year-old company and the founders, Tarek and Glen, thought about cloud-based solutions for clinical trial data at a time when this was just picking up, in the 90s. And, what they did was basically find a better way to think about electronic data capture for the clinical trial. So, in many ways, we were kind of the first company to create cloud-based, SAS-based solutions for capturing clinical trial data. And, the company's been public for about 10 years.

And then, it was probably about six, eight months ago, we got acquired by Dassault Systèmes, which is one of the world's largest companies on modeling and simulation, based out of Paris. And they think a lot about how do you virtualize the entire experience of a variety of industries, but now moving increasingly into life sciences. So, Acorn AI as an entity is part of Medidata, and something that was spun off about 18 months ago. We've been making investments in the data science world for about five years now. Medidata bought an imaging analytics company about five years ago, and we bought what's called a real-world data company, thinking about drugs post-launch, about two years ago.

We're starting to think about the pieces that go into what is the pharma life cycle journey, and how do you really build data science products around that. So, this entity was created out of Medidata. Sastry Chilukuri, who's the president of Acorn, was brought in from the McKinsey world to really build an entire company out of it. And, the founders of Medidata thought that this was the right time for us to kind of build an entire entity and organization focused on leveraging the fact that Medidata actually has probably the world's largest clinical trial database.

We are pulling information directly from the clinical trial, and all of the clinical operational variables that go behind that. So, that's really what we're working off of. It's a very curated, and standardized dataset in many ways since it's coming straight from the platform that we're serving for about 1,300 or 1,400 customers worldwide.

The big picture: Understanding AI in healthcare

Hugo: Awesome. So, I just want to remind everyone out there, if any questions arise, please ask them as soon as they come to mind, in the questions panel, and we'll have a Q&A at the end. Arnaub, thanks for that great introduction. We're here to talk about where AI in healthcare is with effect to the Gartner hype cycle. But of course, I think what's apparent is that AI in healthcare is kind of different, with sub-verticals and sub-industries as well, spanning from drug discovery to clinical trials to all of these things. I thought maybe you could start by breaking down the whole space into these subspaces for us, which will inform our discussion of the impact of AI in health.

Arnaub: Yeah, sure. So, the way to think about this, and this is a pretty complicated system, if you think about the data. As an example, let's kind of tackle pharma just for a second, and we'll get into some of the other verticals.

AI in Pharma

Arnaub: Within pharma, the real problem that everyone's trying to solve for is: how do you improve R&D and how do you improve drug discovery. The real numbers behind the issue are pretty challenging, because you have 50% of late-stage clinical trials, that are just not working or they fail, because there are ineffective drug targets. And, that means you only have about 15% of drugs that actually end up going from phase two to approval.

That is a massive problem to solve. You can sort of segment it a few different ways, but really the key use cases for AI within drug discovery, and R&D within pharma, kind of fall within a few different buckets. So, you have to streamline the entire clinical trial operations process. How do you think about helping researchers recruit or retain patients? How do you think about managing studies? How do you think about predicting which patients might even enroll at certain sites? That's a really tough and complicated process for different types of diseases.

It's clinical trial operations, right? And, kind of improving the entire process for retention, recruitment, patient involvement, and making that process more streamlined. And then, you have the more scientific process behind identifying novel drug targets. This is just where you're seeing a ton of activity, but can you start to think about the optimal combinations of drug compounds? Can you reduce the time and the capital that's required in drug discovery? Can you optimize drug design to increase efficacy? These are all heavily invested areas now within pharma.

As we start to think about the whole time and cost elements behind traditional wet-lab approaches, can we start to isolate molecules and drug targets more effectively by taking an entire library of compounds, thinking about the AI-driven ways of looking at the different sequencing, and come up with better ways to focus on one target. You're basically just trying to improve your chances of success. So, that's really one bucket: pharma. And if you think about the clinical hospital operations world, that's an area that’s in some ways low hanging fruit.

Improvements to healthcare policy and processes

Arnaub: So, from a policy perspective, this concept of administrative waste in the healthcare system, specifically within the US, you just have people who are inefficiently processing insurance claims right now. And, you're starting to think through what are the reasons that this happens. Just a really practical example is, a place like Duke University medical center has 900 hospital beds, but they have 1400 claims administrators, right? And, what you're trying to improve for is the better utilization of staff, and resourcing, and billing and revenue cycle management? And, it's a complicated process, there's a lot of manual review and claims billing, and doctors do things in different ways.

So, there's a real human element of how do you think about, you know, the processing and the workflow management of all these claims? So AI, in some ways it's lowest hanging fruit could, you know, check claim statuses, they could manage the accounts receivable, they could streamline the entire workflows. So, this whole notion of better claims administration is like a huge area of investment as well. And then, I think broadly, and this applies to a bunch of different verticals but, like population health, and disease modeling, and stratifying risk, right? So, what you're doing here, is leveraging machine learning and NLP, Natural Language Processing, to take healthcare data from disparate sources, which includes medical records, medical imaging, genomics, and you are trying to create better risk stratification pools.

Imaging and diagnostics

Arnaub: So, this can serve a variety of things, you know, the easiest one is imaging and thinking about diagnostics. And the most documented and well-traveled field for AI and healthcare so far has been AI-driven diagnostic tools that are helping physicians in clinical decision-making and then helping with disease diagnosis. So, this is happening in spaces like pathology and radiology. You know, the Googles of the world have worked in ophthalmology. It's a pretty well-explored space now, and then the real race is behind who's got the most imaging data.

Risk screening

Arnaub: There's also stuff around hospital decision support. Can you build a better predictive model that will tell you which patients are at the highest risk for a certain disease? And can you then start to think through implementing an algorithm that is physician facing, that allows them to make a decision differently. This is another area where building good models off of good medical records or good claims records can help you prospectively identify which patients you need to go after to target and better treat.

If you even think about risk screening from the social media world, taking information from social media posts—Facebook has done a really interesting job at suicide screening, and being able to identify two weeks in advance who's going to be more likely to be part of a suicidal encounter, and coming up with very, very accurate risk models based off of that data. It's just another angle on where we're thinking about disease modeling and risk screening. It's a huge, huge number of use cases and buckets, but those are some of the three big ones that we're seeing.

What’s moving the needle

Hugo: Awesome. That was a wonderful tour of all the interesting work that's happened, and still happening in pharma. We've got R&D for drug discovery, clinical operations, identifying noble drug targets. Then we have population health disease diagnosis, stratifying risk essentially with a big example being diagnostic imaging. You hinted at earlier that the big tech companies are starting to become major players in this space, and requiring a lot of quick data for models: claims records, risk identification, these types of things. Another interesting aspect is risk screening from the social media world. So, these form the general framework in which we're going to discuss AI in healthcare now.

Arnaub: Yeah. And there are smaller buckets like virtual assistance. There's work being done in the lab space. So, we can go further and further down the funnel. And I think what I described was just where you're seeing the most volume and progress.

The biggest wins for AI in healthcare

Hugo: Yeah. So, that really leads into my next question: I want to talk about what's being over-hyped in the space in a minute. But before that, I'd like to know what you think are the massive, the biggest wins in AI in healthcare, and so forth.

Arnaub: It's a good question. And defining wins is something we should come back to because, I think, you know, in the early days of AI, which means we could go back many years, but we could also just go back five or 10 years, and we could talk about the accuracy of a prediction algorithm, and start to think about what does that mean, and how are we diagnosing a disease? Are we doing a better job at screening patients? Well, we'll get to that in a second, but maybe I'll start with the imaging world since you referenced it. And, thinking about diagnostics in general.

Imaging and diagnostics wins

Arnaub: So, by all accounts using AI-driven algorithms that extract meaning, medical images, has proven to be successful. And, the reason for that is because there's just a large amount and volume of data. You have MRIs, you have CTs, you have PET scans, and now you're starting to see the linkage and the connectivity to the electronic medical record, so that you can derive an outcome from an image and understand what the image is actually showing you. And hospitals are just producing a staggering amount of petabytes of data per year in imaging. So, it's definitely where you're seeing the biggest growth. Just another point of reference is that 90% of healthcare data comes from medical imaging.

So, this has sort of been the bedrock for where things have really started. And, I think the use cases are pretty clinical now. You know, you mentioned the Googles of the world. They in some ways built what the academic community would consider to be a clinically validated way of looking at diabetic retinopathy. These are not just studies that are being done in isolation and being published in open source journals, they're actually being published in the JAMAs and the New England Journals of the world. And this is one way of really understanding the receptivity of whether the model is validated or not.

The most recent example that I would consider a really important statement in the industry is what they did around breast cancer screening. So, Google Health as an entity worked on a deep learning AI model. They actually picked breast screening mammograms from two different countries. They looked at the US and they looked at the UK. And, they built these test sets of data that had about 25,000 women in a few different screening centers. And what they were able to show is a pretty high level of accuracy in being able to identify where you would see a potential a tumor, or where you'd be able to identify a woman prospectively to have breast cancer.

And, I think the real win here was that you were seeing fewer false positives, and fewer false negatives than radiologists themselves. You know, there's a big takeaway from this, is that, I think it was like 5% fewer false positives, and like almost 10% fewer false negatives. And, the system that they built was actually outperforming six board-certified radiologists in the US. So, what we're basically now getting to, and Google's done this with ophthalmology in a few other areas, but what you're now seeing is one, kind of the ability to discern in the dataset whether something is cancerous or not, or whether it might grow into a tumor or not. [And] two, what do you do with that information? That's sort of the next step of the evolution of where these types of companies are going to go.

Building better AI algorithms

Hugo: I had a question around that. The question really is, are these technical successes, or practical successes in the sense that, you can get a model that incredibly accurate and other, you know, great, true, positive and true negative right? But, have they been deployed, and being used on the ground essentially?

Arnaub: Yeah. And, that does lead me to something that we should talk about later, which is the generalized ability of algorithms, and the application of algorithms. There's quite a bit in literature around physician burnouts, and how they use medical records, and whether they want pings, or some kind of interruptions in the clinical workflow to have these types of algorithms at hand. You mentioned we're going to talk about sort of the hype in the space. We should talk about the doctor AI concept, and what does that even mean, and how does this work that Google has done so diligently: feed into something that will ultimately impact patient care, and whether that's happened yet or not.

But I do want to go back to these disease prediction models. Because it's getting better. I’ve mentioned suicidal intents already, and Vanderbilt had a really cool study where they looked at 5,000 different cases, and they said that they could get to a prediction of whether somebody will commit, or attempt suicide within the next two years. And, they were 92% accurate if you would commit suicide within the next week. Right? And, this is an incredible training set, it captures a longitudinality of behaviors and outcomes. And, they're building it off of the medical center database, you know.

So, these types of models are incredibly powerful if they're deployed properly. I think these examples on imaging are just increasingly becoming commonplace. You know, like machine learning algorithms can now successfully differentiate different between different types of cancers. And B, you know, you can leverage the data to basically make very informed decisions that in some cases outperform pathologists, radiologists, ophthalmologists. So, there's got to be a leap here with the fact that we're very successful in now understanding what's going to happen to people, into how does that translate into improving clinical care at some point.

How AI can improve clinical care

Hugo: That's very interesting. So what does the intervention look like? You can have a great prediction, but how do you then act on it in this case?

Arnaub: Yeah, absolutely. And, there's a number of factors at play here. One is around the privacy of the information, and what do you do in terms of interventions, right? Like, just the fact that you're able to create these prediction models—what is the right ethical way to intervene if you need to, and how do you do that? Is it providing a helpline, is it working directly with counseling? Like, how do you start to then work yourself into the process of improving what that person's trajectory could look like.

So, I think that is to be determined in terms of how we're thinking about clinical care. You know, I can give you an example of the fact that I've worked on leveraging EMR data to build better risk screening models for hypoglycemia and diabetes. And, looking at low blood sugar levels and the challenge is one, how do you take that algorithm that did a really good job of diagnosing which patients would be at higher risk, but then two, feeding it into a rational way where the clinician doesn't feel like this is just another thing that they have to look at.

And, if you think about EMR, and UI behind EMR, and how you start to better contextualize information, there's just a tremendous amount of stuff on there right now. And, clinicians are making these multifaceted decisions already. So, does this add more complexity, or does this help in a drastically different way that ultimately improves the outcome?

Cultural challenges in adopting AI

Hugo: I love it. And, you're speaking to a cultural challenge here as well. And actually, something that came to mind, I had a webinar last week about data strategy, and I asked a question, I had a poll question in the webinar. I asked the audience what percentage of the data science work, and data analytics work in their organization? Do they think it is actually used to inform decision making? And, over 50% of attendees said they thought that less than 50% was actually used, which I found incredibly interesting. And I think it's relevant here, because we're talking about really serious wins and big results for AI in healthcare. But, if they're not built into a practice on the ground and utilized, it's a wasted effort, right?

Arnaub: Yeah. And, that does get back to your Gartner hype cycle question around where are we? We can get to that later, but I think there is a lot around how much we need to think about practicality and application, if we're going to move anything forward and get to the side of the curve.

Hugo: For sure. So, the big wins we've talked about so far are in diagnostic imaging, and such things as suicide prevention. Before we move into what's being over-hyped, are there any other big wins, or successes that you'd like to play?

Pharmaceutical breakthroughs

Arnaub: Yeah. I mean, kind of thinking through this now, and this is so germane just based on what's happening with coronavirus right now. But the fact that Cell Magazine put out a really interesting study recently around looking at drug-resistant bacteria. And, this is an interesting space because pharma companies are sort of disincentivized to focus in this space. There's a lack of economic incentives behind it. But, drug-resistant bacteria is a pretty serious problem. There are 25,000 people who die in the US each year. It's a huge source of infections.

So, what this organization did [was to] basically [create] the first validated model of a novel antibiotic target for discovery. And, what they basically said is they trained a deep neural network that could predict models with antibacterial activity. This was cool because they looked at a bunch of different chemical libraries, and they discovered a novel molecule called Allicin. And, it was based off of what's called the drug repurposing hub, which was structurally different from like a conventional antibiotic.

And, what this means is you have an isolated target that could now be expedited into development, to treat a real problem, right? So, this is a big win for an understudied problem. Going back to like the whole drug discovery thing, AI is now taking responsibility for discovering brand new targets. You know, there's a British company called Exscientia that developed the first drug using AI that's going to be clinically tested on humans. So, this is like a trial-ready way to look at obsessive-compulsive disorder. And, what they're saying is they've saved years off of the pharma lifecycle process for drug development by isolating this target, and now it's going into actual human trials.

Unlocking big wins from strong data foundations

Arnaub: So, these are important developments in the space, and we have to talk about the hype but also the reality and get more into what's going to happen out of this. But, the fact that we're moving forward is really important. The last thing I'll say, and this is probably the unsexy stuff, but training sets have frankly just gotten better. And, one of my favorite articles is looking at this JAMA article that looked at NeurIPS, the conference. And, what they found was that the clinical submissions for NeurIPS this year were built upon just a handful of different datasets.

So, as an example, everyone used the Alzheimer's disease neuroimaging initiative datasets. There were 10 submissions that were just based on that. The UK Biobank, you know, which is becoming a really good source of data had a bunch of submissions that were based off that dataset. And, why this is important is because you're now seeing people hover around a really good well-curated data set, and building good models off of it, as opposed to chasing whatever's out there in order to build a model that may or may not be all that effective.

Hugo: That was great. And as you've said, data science is being called the sexiest job of the 21st century. But, I personally think, a lot of the biggest wins are going to be from the unsexy stuff. And, not necessarily only in healthcare but you know, call center routing, and a lot of scuff work around the types of things we're talking about, when you see a lot of the massive wins. I'm excited enough to move on to what you think you has been over-hyped in the space.

What’s overhyped in healthcare AI

Milestones vs. incremental change

Arnaub: Yeah. The wet blanket perspective. So, I'm going to try and take the examples that I just discussed, and then paint like the converse picture of it. So, you know, the big discussion in drug discovery last year was, did AI discover a drug? Right? And, you could find a whole range of opinions on this. But, the real thing that drove a lot of chaos into the conversation was, there was this one paper that came out that said that they leveraged AI techniques to discover a drug in 46 days. And, you know, this high isolation of a target using better throughput screening methods. You know, everyone was kind of, you know, trying to figure out what that meant.

And, what made the paper really interesting was that there was a crowd that said, "Holy cow, this is pharma's AlphaGo moment." Like, we've just discovered something that is so many decades in the making that we think this is a gigantic step forward. I'm glad it feels like it’s moving forward. And then, there were people who were like, wait a second. This is something that we've been doing for a long time. Like, virtual screening has been done for decades now. What these guys did was just a slightly different twist on it. It did move the virtual screening field forward, but to call it like a gigantic, you know, milestone in drug discovery, is a gigantic leap of faith, and probably exaggerating what actually happened, right?

So, there are these two schools of thought. And, it's really important to kind of understand what we're calling success, and what we're pointing to and saying like, this is moving the field forward. So, that's just one thing. The other thing is like, if you think about disease tracking and adding some humility behind it, I think the whole rise and fall of Google flu reminds us that forecasting annual events like the flu on the basis of like, one year of data, is really hard. And, it runs into time series problems. It runs into predicting efficacy issues, and disease prevalence is a really difficult question to monitor.

The value in testing and retesting models

Arnaub: So, there's a positive to this in the fact that John Brownstein's lab at Boston children's hospital did end up working with Google. They ended up showing that they can do an improvement on the model. And, John's lab even created an AI model to track coronavirus. And what this goes to show is that we just have to be tempered about what we're calling a successful model, and then we have to test it and validate it and test it again. I do think things change, and I do think people need to understand that we have to test these same models on different datasets to better understand what we're calling a success.

I guess the two things I want to think about are, and I'd love the community's perspective on this: how do we think about the value of prediction models? The practice of medicine is constantly evolving. And, we have new technologies, we have new disease epidemiology, there's new social and behavioral factors that are impacting how patients are handling a disease. So, the whole value of prediction changes.

And, what's interesting is like a disease prognosis model for breast cancer had to change, because we now have a new biomarker, Her2 Negative, that basically changed the whole face of targeted therapy, and prediction models. Right? So, is there value in like these “n of 1” moment in time predictions, or how do we continuously alter the fact that the disease is evolving, the treatments are evolving, and the patients are evolving? That's kind of one thing I've been thinking a lot about and the notion of value of prediction.

Hugo: One of the big techno-philosophical questions in machine learning in that, how often is the past a good predictor of the future, right? And how often, we were going to talk about generalizability in a certain sense, but when we have arising new techniques, and the world's changing, how good are these models on the past, right? But in health, it's unclear, particularly as you said, a lot of these have to do with risk stratification, right? And, understanding what your uncertainty is.

AI does not understand direct causation

Arnaub: Yeah. This actually goes to my other thought behind this: does AI in health understand causation? You can basically say that AI's ability to understand correlations, is good. It's sort of like the most basic level of causal reasoning. But, if you look at patient disease modeling, you can leverage AI, and it's most for this extension of what we're calling it, and say that some events might be associated with other events, but we don't really know directly what's making it happen. And for something like AlphaGo, it's a trial-and-error way of understanding which moves will cause someone to win.

And, in healthcare, that doesn't work because we need a more general understanding of all the dimensions, and all the problems. So, this is about trying to just really make it real. If a patient is dying in a clinical trial, is it the fault of the medicine, or is it the fault of something else? So, what we're really great at right now is well-defined tasks within healthcare, which is where diagnostic screening and understanding a cause and effect model makes sense. But, we're not really understanding the rationale behind that.

Hugo: Yeah. And this speaks to—I think it was 2008 right? Chris Anderson wrote in a highly provocative article, which was wonderful and infuriating, called The End of the Theory. Because of big data, we don't need to actually understand the world anymore. But of course, we want to take account of things. It's a very human endeavor to make sense of things, and develop understanding, particularly when it impacts so many lives on the ground. So, I think this is a nice thing.

The relationship between interpretability and accuracy

Hugo: We've actually got a question from Perea, AND I hope I pronounced your name correctly: Perea asks, what's really the relationship between interpretability of algorithms and accuracy? Perea wrote that deep learning algorithms are really powerful diagnostics, but at the same time, medical doctors will question the explainability. So, I wonder if you could just say something about the relationship between model performance, interpretability, and how that relates to causation or unhuman understanding.

Arnaub: Yeah. Yeah, that's a really great question. So, you know, one of the questions I get all the time is, you know, build us a prediction algorithm, but then prove that it can be deployed in a variety of settings and geographies. And, there's now more literature around this is like, how do we look at the replication of findings, and implement reproducible research? And claim data as an example, is a good foundational dataset. The challenge with claims data is that it's not medically deep, right? Like, you're capturing a point in time, and you're also just looking at basically a medical billing process.

But at the same time, it's epidemiologically representative, and it's useful, and we can build good models off of it because it's well structured. So, a lot of disease prediction models have been built off of claims data because it's capturing large portions of population, and with that we're able to create reproducible models. Now, the question around practical implementation is tough, because what we're saying now is, we've built these really good screening models, but can we describe the symptoms and the conditions of these patients consistently over and over and over again off of that same model?

And, I think this is where we are seeing a real struggle with healthcare systems that are wondering what is a deployable algorithm, and we believe in its accuracy. We believe in the fact that it is causal. But, we are not sure if the generalizability can be applied. And, you know, this is where I don't think there's a good answer to this. Like, any epidemiologist or a biostatistician will tell you that there is no good dataset. Right? And the risk that physicians run is how do they actually use this. What I am seeing for example in radiology practices though is that, you know, imaging data is a great use case, because those images look the same over and over and over again.

And, if you can tie it to good EMR, and good outcomes data, you can predict how to treat a patient who is consistently going to have a certain procedure done, or they look like a set of patients who will have that same procedure done. You know, if you look at the top eight most common procedures in the US, something like a knee replacement, or a hip replacement, you can track the trajectory of those patients pretty accurately. And, you can start to better create disease models that will treat those patients, and get them out of the hospital, and then follow up on care more effectively. So, I think we have to isolate our use cases, and we have to start to think through where people are willing to take a shot on what they want to deploy and implement.

Cataloguing and creating curatable datasets

Hugo: Absolutely. So, we're about to get to what the slope of enlightenment will look like. I have a related question from Max, which I think is very interesting. Max is interested in image based diagnostics, and Max asks, isn't it too hard to scale because of the variability in hospitals and equipment. Now, I don't know if you've read, in the inaugural issue, the Harvard Data Science reviewed last year, Michael Jordan had a very interesting article in which he started off with an anecdote. I'm probably going to get this horribly wrong, in which the specialist who looked at the imaging diagnosed in a particular way, due to some white spots, that was supposedly a signal with respect to the condition that Michael Gilman's child might have.

And, Michael Jordan being a machine learning guru, and statistical guru, one of my Yodas in fact, went back and move through the research and realize that the imaging equipment that was used in these results, for his child, let me get this right, was newer equipment compared to the studies that the diagnosis was based on, and what the clinic was actually seeing, was higher resolution. And, I was literally just seeing white. It was literally white noise that will make a diagnosis based on what was in that signal, due increase resolution for the equipment. So, I think that gives voice to Max's question.

Arnaub: Yeah. I love this question because the concept of cataloging and understanding what is a curatable dataset is super critical, right? So, there are wide variations in pixel quality, in dimensionality, in what contextual features you can actually extract off of imaging data. So, what ends up happening is we could say we have a million images and, you know, maybe 10,000 of those are even usable. And, that's what we're seeing in some of these use cases. You know, with the Googles of the world where, they are working with some of the world's largest imaging companies, whether it's Nikon, whether it is I-institutions that specialize in capturing retinal fundus images. What you end up finding is that, the funnel gets very small very quickly if you look at what's an actual usable image, because of what you just outlined and the problem that you described.

So, you know, we're working with a very small set of population at the end of the day. This is a problem that I think we're trying to tackle in my company right now. Because, if you think about what we do in clinical trials, we are capturing the MRIs, and the PETs, and the CT scans directly as part of the clinical trial. And, arguably we have probably one of the world's largest clinical imaging databases where we have all of that information. But, how much of it is useful, we need to come up with the source, and the machinery, and understanding of the modality, and the body parts. And, there is a large process that goes behind segmenting and cataloging the data before we even start to attack the question that we're trying to solve for. So, I can very much empathize with that question.

Hugo: And, I actually wonder, and this is tangential, you and I have discussed the checklist manifesto before, and I wonder whether people on the ground using, you know the results of models and that type of stuff, could have short checklists that they could just make sure that the equipment was being used was the same that was used in the study, and this type of stuff.

Arnaub: Yeah. And, there are cool companies like, you know, Saliency AI out of Stanford, which is just to really sharp MD, PhD's who are trying to create usable datasets. So, they're doing the scout work, right? They are understanding what's a usable dataset, and they have the equivalent of what you described, which is like a step by step way of evaluating data. So, this is something we just need to do. And, imaging is just very hard, but if you have the volume, it doesn't sound as big as it actually is at the end of the day.

What the slope of enlightenment looks like

Hugo: Yeah. Great. So, we've seen a bunch of wins. We've seen some of the highest areas of impact. What's the slope of enlightenment going to look like, Arnaub? So, you could chat about this for five or so minutes, and then get to some questions, that would be awesome.

We’re heading into the trough of disillusionment

Arnaub: Yeah. So, you know, the hype on this has been well documented I think. And, if you look at total private, and public sector investment for healthcare and AI, it’s close to 7 billion I think as Rock Health reported. So, you know, for me, like what the hype curve now has to show is we're well into the part of the curve where we're talking about the fact that the expectations are high. I think we're well into the curve of the peak of inflated expectations. And, I would argue that we're actually getting to a place where we're not at the trough of disillusionment, but we're getting close.

The need to temper expectations with AI

Arnaub: You know, there are still a few major challenges in terms of why we're setting ourselves up for a let down. And, the rationale behind this is something I alluded to earlier: the AI doctor. It's just important for people to realize that we're not replacing physicians anytime soon, and that, you know, we're improving and augmenting and insisting. But, the fact that people have expectations of a company, or a startup that will replace the process of which a doctor evaluates the patient, it's just wrong. And if the capital community, the venture capital community, or others believe that this is what's going to happen with AI, there's a need to temper expectations.

It’s the same thing about what I mentioned for drug discovery. There are these hyperventilating articles that go on saying that this is the AlphaGo moment, and that we've discovered something new. Drug discovery takes a long time, and isolating a target takes a long time. And, my biggest fear is that there is an expectation on ROI, on capital investment within three to five years. You may start to see the exits for some of these companies, or they may IPO. And the hype cycle is feeding into that. But, the actual process of making a drug and developing a drug, and seeing whether it's clinically efficacious and the regulatory bodies have to approve it—that can be five to 10 years.

So, we have to build in a window of time that allows us to say that these companies are working hard, and we just have to be measured about what we're calling success. And this is now impacting a lot of other startups in the space where they're starting to say, if you put AI in your description in any way which includes my company, the expectation now is that you're doing something that's revolutionary. And, you are augmenting the process, or you're discovering something new. And, we just have to be careful about what we're calling it, before we start to put more oil on the fire.

Hugo: That's one of the reasons I like the Gartner hype cycle a lot: whether it's about a model of old technology is everywhere. I mean, it clearly isn't. But, the fact that we do get inflated expectations around buzzy things, it's really important to recognize. Because, what happens is you can get some sort of trough, right?

But there have been many AI breakthroughs

Arnaub: Yeah. Exactly. I mean, the one thing I will say though is like, we have to appreciate what's been done. So, the fact that we're meeting some of the expectations, something that you guys should all do is go to the American college of radiology website, and there's a section of that website called FDA cleared, and approved algorithms. It is really long. And, you can see just how much the field has grown. If you start to look at the fact that these are actually regulatory approved AI based algorithms, or machine learning based algorithms, or even regression models. But, what I'm getting at is like these are substantive steps forward if one of the most traditional guarded institutions in the US is saying that this is an FDA cleared AI algorithm.

Hugo: That's really interesting. And I think a lot about approval systems for algorithms that are in production, algorithmic audits, these types of things. I think we do have a future where we will need to classify audit, and essentially do a sensitivity analysis around a lot of the algorithms in production to see how stable they are, and how they're impacting people on the ground.

Arnaub: Yeah, absolutely.

Hugo: What else do you think the slope of enlightenment will have in store for us?

Connecting the dots

Arnaub: AI can't continue to search for the notes, but maybe it'll start searching for the novel. And, this is kind of what I was alluding to with drug discovery, or changing a process. So, as an example, if we can get to a place where ML driven algorithms, or AI driven algorithms are automatically processing two or three dimensional image scans, they are automatically identifying clinical signs like a tumor or a lesion, and then they can tie that to a clinical outcome. That's really important.

And, I think there's clinical examples of us now getting to a place where we're saying, we're doing a great job at assisting the diagnosis of something. Can we do the entire patient journey, and better understand what's going to happen? You know, soup to nuts. I think that's like a real big step ahead. But, I think that's one area. The other area is bias. And, we haven't really talked about this, but it's sort of embedded in everything that we talk about.

Adjusting for inherent bias

Arnaub: One of the biggest things I've been thinking a lot about was this, there was a great paper that came out using a United Healthcare database, which is one of the most heavily utilized databases in pharma. And, what they were saying is that, you know, what the algorithms were doing were basically saying that black patients were assigned the same level of risk by the algorithm, but they were actually sicker than white patients. And then, the bias occurred because the algorithm is using healthcare costs as the measure for healthcare needs. And so, what they're saying is they're predicting healthcare costs rather than illness, and less money is being spent caring for African American patients than for white patients. And, that's a really dangerous precedent. If we're leveraging and building algorithms off of this type of bias, we're in a really difficult spot. So, how do we get rid of the inherent bias that's already present to some of these datasets?

Hugo: Yeah, I agree completely. I think, as I was saying in this particular case, blindly using a proxy, which most of the time we are using proxies, right? It's very difficult to measure the really actually interesting things a lot of the time. But, two of the biggest challenges are biases and training data, and using proxies for the metrics.

Ethics in healthcare AI

Hugo: This is a wonderful segway into a set of questions we have. So, we have three questions that are all around ethics in AI, with respect to healthcare. Pereissa asks a general question around, are there some ethical guidelines for the practice of AI in healthcare? And, I'm going to cite the other questions as well. And, maybe we can talk about all of them holistically. Daniel asks, what's your perspective on trends in patient privacy, and the impact on the development of AI based algorithms?

I’ll also add that if we're talking about patient privacy, we also need to be talking about the fact that a lot of these algorithms are now being built by big tech companies. As we know, Google probably has room for improvement on privacy. That's an understatement—Facebook also. Apple is, I mean, one of the differentiating factors is privacy. So maybe that's something we can speak to. Holly has a question around biases in healthcare. And so Holly asks, why can't healthcare if present from the research stage, with respect to adequate or inadequate representation of minorities in clinical trials, all the way through to actual care. So, how do you think bias compares when you're doing these at scale from traditional methods in healthcare? To summarize these, a general question around ethical guidelines, one around bias in general, and one around privacy.

Tech companies are leveraging patient datasets

Arnaub: Yeah. There's definitely a lot to unpack there. We could have a whole session on this, and we probably should. So, you know, I think the interesting challenge behind this is we're certainly at a point now where there's just an increased amount of scrutiny in terms of how we're using healthcare data. And, in some ways, rightfully so. In some ways it's a little bit overinflated. So, you know, just today Stat News did a really interesting expose on the contract that Google signed with the University of California San Francisco to mine patient data. And, the takeaway wasn't that Google was mining patient data, the takeaway was that there were a ton of patient privacy protections that were built into that agreement.

And, the agreement is public. You know, I don't know how Stat got their hands on it, but it's a really compelling read, where you can actually look at how they created the language around the fact that, you know, there were IRB requirements, there were ways to think about the fact that certain data sets had to maintain, you know, very, very strict privacy concerns. And, I think what I'm getting at is like, there's a little bit of shock and awe behind the fact that technology companies are leveraging claim data sets.

This has been done for a long time. I think what's changed is just the public now being aware of it, and the perception around it. Alternatively, that doesn't make it better or easier. And, the fact that, as an example, United Healthcare—they have a whole dataset that leverages in EMR, and the claims are being sold and resold department companies for about 20 years now. And it's something that's been done for some time. So, the fact that we're just more acutely aware of what's happening. I think the Cambridge Analytica examples of the world are certainly not helping in terms of [demonstrating] how data can be nefariously used. I think there's just different optics for healthcare data that has to be acknowledged and recognized.

How Medidata handles patient data

Arnaub: And, you know, just to give you some sense of how we handle it at our company. You know, by repurposing and reusing clinical trial data, you know, a lot of people might ask like, how do you think about patient consent, and how do you start to think through how that data's being used. You know, we have a bunch of rules that we built in and validated with, you know, regulatory bodies and with external experts saying that we have to have a minimum of two pharma companies as an example, contributing to a dataset. It can never be identifiable. We can never reveal the identity of the sponsor, or of the patient.

We built tools that actually are customer facing, but they'll never have access to a licensed dataset, you know, which is sort of our worldview of how this works. And, that's just sort of things we built in, and we're use GDPR compliance just as an example. Going back to that question around like what we should do. So, there are a number of governing bodies around this now. So like, European commission has a whole expert group called HLEG. This is like a high level expert organization that's focused on like trustworthy AI, and what you should do. And, they published a lot of key requirements on what AI systems should be, and [how they can] be deemed ethical.

Companies are adopting their own ethical AI guidelines

Arnaub: Even the Googles and the Microsofts of the world have written out their own ethical AI principle guidelines. This is important when you think about how you [can] leverage their technology solutions if you ever want to work with them. There's a lot to think through on how we're to regulate technology too closely. If we do get into a world where we are over-officiating, I think we might stop progress. And, that could be a very different type of concern if we're kind of going on the other end of the spectrum.

Hugo: Right. So, how should we then think about the fact that, I mean, speaking about companies like Google and Facebook that essentially that, I mean their business models revolve around extractive principles of taking data from users who are using products for free. And, the added thing that they're very good at acquiring different companies to get different types of data. So I think, the Google acquisition of Fitbit late last year is a really interesting one, whereby then they could produce whatever joins they need to do in order to build better predictive models to market to us better.

Arnaub: Yeah. So, that's another example where in many ways it's a hardware acquisition, and it's also probably a data acquisition, right? And, if you start to think through how companies are longitudinally modeling your moves, and your activity, and your behavior, that's just one of the things where, in many ways companies are already doing that, and there's just an increased scrutiny when it comes to your healthcare data. So, that's just something to think about—thinking about the tech companies and what they're going to do. A lot of companies are now open sourcing, or open accessing their algorithms, and if that is opened up to the community, I think that's a great thing to do.

Patient data can be used for good

Arnaub: So, as an example, I think there's a hospital in London that was working with Deep Mind, which is one of the Google companies. And, there were some challenging optics around that project. There were some serious privacy concerns, and I think tech companies are learning the hard way, but the fact that they created ophthalmologic algorithms, with now open access, this could someday be used by other researchers, or by other startups to diagnose eye diseases more effectively.

So, we need to be able to release the code in some capacities, and we need to be able to understand how do you augment the work that a lot of experts have already done. Because, there are large communities that I think could be part of the experience here, and creating open access AI. It's something that we're still in the early days of, but the tech companies are talking about it. We just need to be much more transparent about how we're doing it.

Bias in healthcare exists

Hugo: Yeah, absolutely. So, I just want to move back to Holly's question. Just thinking about bias in healthcare—there’s always bias in healthcare, and there is bias in everything. What has changed with respect to bringing AI into the healthcare space compared to traditional methods?

Arnaub: Yeah, this is not a problem that we're going to solve anytime soon. You know, and I think this public example of what happened, you know, with this Optum dataset with showing that the prediction was saying which patients will need extra medical care because it actually preferred white patients over black patients. You know, this is a pretty serious ethical dilemma, right? So, I think what that goes to show is that, you know, what was really damning about that I think, like if you kind of dig deeper into it for that dataset, is that that level of bias, and that law, and that algorithm was actually apparent intent other widely used algorithms that this company used, right?

It's the largest database in the country. It affects 150 million people in the US. So, you know, race itself wasn't the factor that was in the algorithmic decision making. But, using medical histories, you know, kind of looked at cost, which is not a race issue, right? So, there are socioeconomic reasons that black patients have historically incurred lower healthcare access, and as a result, this algorithm preferred white patients. And I think, you know, the fact that the researchers then worked with United to help correct the issue, and they changed the model, they address the disparity of the patient populations.

You can correct algorithmic bias if you flag it

Arnaub: You can correct algorithmic bias. But, you have to flag it first, right? And, that ends up being the biggest challenge. So, if you understand that there are models that are involved in hiring, or in risk scoring, or in predictions, and, they're inherently building off of something that is not implicit, or obvious, that's just probably the... I don't want to the best you can do, but it affects everything. It affects the legal system, it affects hiring patterns. We're in an unfortunate place where we're relying on the algorithm to make these decisions, but we need to go back retrospectively and see if they're actually making the right decisions. I don't have a good answer for the person who asked that question, but that's just the world we're living in right now.

Hugo: Exactly. And I think, flagging it and noticing and as you say, is the first step. And, figuring out what our blind spots are as well. Particularly with respect to underrepresented groups who, I mean, this is about how power is distributed as well, and that the people who are building the algorithms might not represent the people who are impacted on the ground. So, consulting with as many stakeholders as necessary.

This has been so great. We were going to have to wrap up, but I think we'll have to continue this conversation very soon. We received a lot of questions. We couldn't get to them all. But, do feel free to reach out to myself or Arnaub on LinkedIn. The recording will be sent via email within 24 hours, and we'll see you for the next webinar. And, one more time, just love to thank Arnaub so much for taking the time to have this wonderful conversation today. Thanks Arnaub.

Arnaub: No, thank you guys. I always love talking to you and DataCamp, so really appreciate the opportunity.

Hugo: Fantastic. All right. Thanks.

Please note this conversation has been lightly edited for brevity and clarity.