Adel Nehme, the host of DataFramed, the DataCamp podcast, recently interviewed Shameek Kundu, Chief Strategy Officer and Head of Financial Services at TruEra Inc., a California-based technology company dedicated to helping make AI trustworthy.
Adel Nehme: Hello. This is Adel Nehme from Data Camp and welcome to Data Framed, a podcast covering all things data and its impact on organizations across the world. Arguably the most data rich industry out there is financial services, whether banking, insurance, investment banking, and more. Data science and machine learning have a variety of use cases. However, that doesn't mean that machine learning is being adopted to its fullest potential, as there a lot of different obstacles in the way.
Adel Nehme: This is why I'm so excited to have Shameek Kundu, chief strategy officer and head of financial services at TruEra and former group CTO at Standard Chartered. Shameek has spent most of the scarier driving responsible adoption of data analytics and AI in the financial services industry. He is a member of the Bank of England's AI public private forum and the OCD global partnership on AI and was part of the monetary authority of Singapore Steering Committee on fairness, ethics, accountability, and transparency in AI. Most recently, Shameek was group chief data officer at Standard Chartered Bank where he helped the bank explore and adopt AI in multiple areas and shaped the bank's internal approach to responsible AI.
Adel Nehme: Throughout the episode, Shameek discusses his background, the state of data transformation in financial services, the depths versus breadth of machine learning operationalization and financial services today, the challenges standing in the way of scalable AI adoption in the industry, the importance of data literacy, the trust and responsibility challenge of AI, the future of data science and financial services and more.
Adel Nehme: Shameek, it's great to have you on the show. I'm excited to discuss with you the state of data science and financial services, your experience leading data science at major organizations and your current role at TruEra. Before we begin though, can you give us a brief introduction about your background and how you got into the data space?
Shameek Kundu: First of all, thank you so much for the opportunity. It's great to be on your podcast. To your question, I'm an engineer by training. I followed it up with an MBA in finance and systems. First job was to help build an online retail brokerage, believe it or not, two decades ago back in India. Then joined McKinsey, spent eight years advising financial services clients in Europe on technology and operations topics. And then in 2009, right in the middle of the financial crisis, joined Standard Chartered Bank, which is an Asia, Africa, Middle East focused international bank. And I was there for the last 11 years focusing on data and technology roles before joining TruEra this year.
Adel Nehme: That's great. And you're someone who has a breadth of experience at the intersection of financial services and data science and AI. As you said, in your previous role at Standard Chartered, you've had multiple CIO roles and were the group's CTO for the organization for about six years. Given your breadth of experience, I'd love if you can describe how you view the state of data transformation in finance today and how it has evolved throughout your time as being a data leader in the industry?
Shameek Kundu: So financial services has always been a data driven business, right? Everything you do in ... There is nothing, in some ways, nothing real in financial services. It's all movement of data from one place to the other. Your bank balance is a piece of data. Short or physical cash and the few checks and demand drops that are there, everything in financial services is data. So it's always been a business that is all about processing, storing, protecting, and moving data around.
Shameek Kundu: What has changed is two things over the last six years. First, I have seen the questions around data move from being a pure defensive play to something that has also got a strong offensive leg or a business facing leg. So defensive, meaning I need to take care of this data in order to run my business, such as, well, I need to know the balance of the customer. I need to protect their information. I need to send the data safely. All of this is I need to, because I am required to, and that is core to my business. From that to a more offensive, which is actually using all this information. I am able to do a lot more for my clients, for my business, for my partners and so on. So that's been one shift which is pure defensive to let's call it a defense and offense play.
Shameek Kundu: Just before we move on from this, I do want to stress the offense and defense bid. It's not that the defense part has gone away. In fact, arguably it's become more sophisticated as regulators and citizens have started looking beyond data quality, which most people are worried about earlier and data retention, to privacy, data ethics, they've got to make transparency, fairness, all the bigger concerns about big tech and so on. All of that has made the defensive angle much more interesting, I suspect to CDOs, but it has also been in addition to a strong enhancement on the offensive side. So that's one big shift I've seen.
Shameek Kundu: The second thing is who talks about data. When I started being chief data officer, everybody in my bank at least was happy to leave me in my corner and deal with data all on my own. By the time I left, everybody from the group CEO down ... Actually, I lied. From the group chairman down, understood and wanted to be involved with the data journey, right? So there's a very big broadening of the set of people in a bank or a financial institution who care about data and who think about their job being primarily about data well beyond the people who have data in their job titles.
Shameek Kundu: So of course these are the two big shifts I have seen. Of course, there've also been a whole host of enabling changes on the side, everything from strides in data analytics, machine learning related technology in adjacent technology spaces like API that has made data interchange so much easier. There's a lot of broadening and deepening of data related talent in the organization. Regulators have understood the data and analytics related challenge and opportunities more. So all of this has happened, but ultimately these have made these two shifts possible from defense only to defense and offense, and from being a specialist geeky thing left on its own to something that everybody from the CEO down understands and wants to be involved with.
Data Science, Analytics, and Machine Learning in Finance
Adel Nehme: I find the shift from defensive to defensive and offensive to be very fascinating. You mentioned that in the past few years, the offensive part has been about enabling value through data science, analytics, and machine learning. Do you mind describing where are these areas of value in financial services are today?
Shameek Kundu: So that's the first one. So that's effectiveness in risk management. The second, which is arguably not new at all is incremental revenue increases. So again, you can argue, this is not new because the first analytics use cases in banking were around customer retention and cross sell. But these have gone massively turbocharged today on the backup advances in analytics and data, right? So that's the second block of use cases, if you will. The third, of course, are efficiency improvements in the middle and back office process. It's not very glamorous, but actually can save quite a bit of dollars when you automate stubbornly manual processes, such as those in trade finance and insurance games assessments.
Shameek Kundu: The fourth closely related to the above, and one that is not always visible, but a lot of times when you and I, as customers, we think about the end to end digital experience whether it's the time and effort it takes for us to onboard as a client or to complete a transaction or to get information about where my payment is, all of that is actually underpinned by a lot of data and analytics, particularly by good data and data interchange if not advanced analytics always. So there's a big role that data and analytics is playing in shaping the customer experience on your app or new website, whatever, right?
Shameek Kundu: And finally, and most importantly, there are transformational business model changes that data and analytics is enabling. So this where you're not making the existing business less risky or more efficient or slightly more revenue producing, or when you're not just improving the customer experience, but you're actually building brand new businesses. So what do I mean by that? Well, this is a whole holy grail around improving financial inclusion, not just in Africa and Asia and Latin America where not everybody might have a bank account, but even in supposedly more developed economies where the access to financial services is uneven. And by being able to use data and analytics effectively, you're going to be able to expand that access and thereby create a brand new business for yourself.
Shameek Kundu: So that's a huge area of opportunity. There's opportunities around building brand new businesses like banks that are tying up with e-commerce platforms to be the bank underpinning them. They're calling it the platform as a service kind of strategy where you are the bank underneath many commerce [inaudible]. That whole business model is based on that. But also if you're not a bank or insurance, if you're an InsureTech or a FinTech that is either trying to steal some of the business from a bank or insurer or trying to support the bank or insurer, again, you're able to do things in dramatically different ways. For example, you're able to take ... In insurance, for example, there's entirely new ways of underwriting insurance that is based on IOT data and analytics that is just not possible. So it's almost like a continuum from more effective risk management to incremental revenue, to efficiency improvements, to better customer experience, to finally large-scale changes, transformational changes in business models. There's this whole spectrum, and of course, adoption levels are different in different areas.
Adel Nehme: And how would you then describe the state of adoption for many of these use cases? It seems that there is a breadth of use cases being operationalized, which is extremely exciting in terms of scanning value. Where do you think the industry stands today?
Shameek Kundu: So I think it's important to distinguish between what I would call just data and analytics, which include some predictive models, but not machine learning, and on the other side, the machine learning end of it. If you just refer to traditional data and analytics, meaning the extensive use of data, descriptive analytics, visual analytics, predictive analytics, not using machine learning, I would say four out of these five categories are pretty advanced. The only one that is not that advanced is building transformational new business models. But whether it's better risk management or being more efficient or transforming the customer experience, all of these things are being done quite a lot at scale. I'm talking about tens, hundreds of millions of dollars that banks and insurers are getting in terms of incremental value today, not tomorrow, today, right? And this has changed significantly in the last four to five years.
Shameek Kundu: But when it comes to machine learning in particular, the story is different. I would call it the broad but shallow story, right? And I quote three different sources for you. One is a Bank of England survey from 2020 and '19, which looked at the level of adoption by banks and insurers which are in the UK. Another is a study by Temasek, which is a Singapore sovereign wealth fund, which was recently released on the adoption of AI in financial services. And the third is my own rather unscientific assessment based on 35 interviews in the first half of this year, right? All of these three things suggest two things. One, that between 50 to 65%, so between half and two thirds of even the traditional financial institutions have started using machine learning in a non-trivial way, meaning beyond a pilot or a proof of concept. They're actually using it. There's some value coming out of it.
Shameek Kundu: But between 10 to 20% of them, so only one out of 10 or two out of 10, depending on what source you choose, are actually enabling AI at a level where it makes a real difference to the bottom line. So it's not just pilots and proofs of concepts. People are adopting AI. But if you ask the CEO of a bank or insurer or even most fintechs, will a failing ML model become a big source of worry for you, in the vast majority of cases, it's not reached there yet. Right? And that's the story. It's broad but shallow. The only exceptions I would say are in areas like marketing and fraud analytics, where of course machine learning adoption has been there for a while. But generally speaking, traditional data and analytics adoption quite high, quite impactful in many organizations. Machine learning adoption broad, but still mainly shallow in all but the one in 10 or two in 10 organizations.
Adel Nehme: What do you think are the main barriers to that deep adoption of machine learning and AI in the industry?
Shameek Kundu: So as you say, there's groups of these barriers. There's technological barriers, there's organizational barriers, there's talent barriers of business that you guys are in. There's values related to data itself and then there's regulatory barriers or things that can be construed as barriers. And finally, of course, there's barriers around trust. So let me go slightly deeper into each of them. I mean, technological barriers, banks and insurers have built up their systems over decades, sometimes in the case of my previous employer, over a century, and they will have the proverbial silos in spades, right? I mean, it might sound like a cliche, but it is really a very big challenge to get all your data together about a particular customer in a way that can both make it meaningful to do analytics on that person, so I really understand Shameek Kundu as a customer because I've covered everything from their wealth management holdings to their last interaction and complaint with us, to their external data. All of it I've brought together in one place and I've done it sufficiently in real time to be able to influence a frontline business process.
Shameek Kundu: I mean, you can do one or the other. You can either give real time interactions with the customer, which use some limited data, or you can paint a very nice picture of the customer, but it's not possible in real time. I mean, people will tell you it's possible to bring everything together, but I'm yet to see that, right? So that's a very big technological barrier. And by the way, it's not just a problem with legacy banks. A common mistake is to say, this is why fintechs or neobanks will win. Well, only if they're playing in one area. The moment a fintech or a neobank tries to build out its profile, you have no option. You will need to get deeper and deeper into specialized areas. And of course, it won't be as bad as a traditional bank or insurer. But this challenge of saying actually I've got systems for different parts of my relationship with the customer. I need to bring that together. It's quite a big one. So that's probably the biggest barrier that you can see visible barriers.
Shameek Kundu: There's organizational barriers, partly because of lack of trust, which we'll come to later [inaudible], but more broadly about lack of trust about how other people will treat my data. There is a certain degree of silos within organizations where different teams might be concerned about sharing their data more broadly in the fear that it might not be handled properly or even not knowing whether this data that I've collected for XYZ can be used. If you're coming from a tech first organization, frankly, these barriers have the reverse. I mean, they start by assuming everybody can access everything.
Shameek Kundu: Then you've got talent related barriers, I think an area that you guys will be well aware of. The only thing I would say is it might be useful, and we can talk more about this, to talk both about the core data related skills, but also about how to increase the data cautions of the rest of the organization. It's not enough to have a pool of data specialists or analytics specialists. You need the entire organization's talent, AI or data quotient to increase, otherwise you can't get the full value. I mentioned there are barriers around data itself. I mean, the fact of the matter is we sometimes overestimate how much data financial institutions have or how much they're able to use. And that means in several areas, there may simply not be enough data to build an effective machine learning model.
Shameek Kundu: One example is money laundering, right? It's one of the best areas, anti-money laundering is one of the best areas where you'd want to use predictive analytics to dramatically improve your ability to identify money laundering activity. But here's the problem. When a bank, after a lot of pain, after a lot of false alerts, I'm talking 99% false alerts. When a bank takes those 1% where it believes it's got real money laundering situations and reports it to a law enforcement agency, they don't actually get back positive confirmation often because it won't be known for years. So what that means is one of the most important factors in any supervised machine learning, which is the ground truth, is missing. You don't have positive instances of money laundering highlighted. So how are you going to train? Of course you can do unsupervised learning, but all you generate with that is unusual behavior. So it's a really difficult ... So this is just one example.
Shameek Kundu: But there are many areas like this where it's not that easy to say I have positive behaviors that I can use. I mean, look outside banking, for example, one area around banking and insurance, one area where this became very obvious was ... I mean, look at how poorly machine learning performed in most areas of dealing with the pandemic, whether it's predicting who's going to be hit by it or predicting which medicine might work or predicting how the vaccines ... I mean, I think some models got the numbers right, how many people will have it. But in terms of helping authorities with either what medicines to apply or who is more likely to build it, it's just not done well. I mean, there's been a very good review in MIT Tech Review for us for that, right?
Shameek Kundu: So these are quite serious concerns. And then there's, of course, regulatory concerns. This is increasing all the time around data sovereignty, concerns about the power of big tech and big data, concerns about unfair behavior with algorithms. These are also impacting. And then lastly, there is trust. Now what I mean by this, this is particularly around the trustworthiness of algorithms. In an industry that has a lot of specialization and there are subject matter experts who've been ... You can argue actuaries or the original data scientists. They've been there for a very long time in insurance, for example. You don't suddenly come in and say, "Move aside. My algorithm will do a better job than you because it's been trained on historical data." Well, first of all, it might not be true. Indeed, experiences like COVID show. It's a valid concern that past behavior is not always a good indicator of the future.
Shameek Kundu: But secondly, even if it was right, you are entering a business for the vast majority of banks and insurers where there is an existing way of doing things. You cannot just turn everything off one sudden day, at least if you're a traditional bank and insurer. You do need to do the work as a data science team, or as someone who's wanting to use data science. You have to do the work to carry along the rest of the organization. And of course you have to carry along your clients, your regulators and so on. And I think this has been a major barrier to the adoption of machine learning in regulated financial services.
Adel Nehme: I'm very excited to unpack trust and explainability with you. But before we do that, you mentioned here talent transformation, which is something that DataCamp focuses on. What are your thoughts on the talent transformation challenge required in financial services to have scalable value from data science, machine learning and AI? You mentioned a lot of technical skills and data science and machine learning. But going beyond that, how do you imagine a data literate organization and the industry to look like?
Shameek Kundu: Yeah. I mean, to be honest, you guys are probably the specialist, but I've been thinking about this and I think it's an area where I'll be fascinated by how you guys continue to contribute in this space as well. But as you said, there are these two blocks of work. In the first block, I think, of course, there's data scientists, data engineers, but some of the other areas are somewhat neglected in my view, like data risk management, which people brought me. We call it data governance. But that's a very big term. I think you need people who are data librarians. I mean, if you go into ... I think the original Google piece was this. I suspect they hired many people who literally did information management, right? Which is how to structure data properly, et cetera. So I think there's more specialization even within data, beyond people who build predictive models and people who run the pipes for data.
Shameek Kundu: There's a lot more, whether it's data risk management, data governance, data visualization. I think there are people who can do data visualization and there are people who cannot. And I, by the way, am firmly in the second category. And if you can't, you just cannot tell the story in a way that works. So here the focus is both on quantity, I think we don't have enough of these skills, as well as quality. But it's also very importantly about making sure that these specialists understand the broader context of the industry when they're joining. So to take a non-financial services example, just going back to that COVID report in the MIT Tech Review, one of the examples quoted there was how a particular hospital had very high cases of COVID perhaps because of it's [inaudible] area whatever. And what did the model learn from that data? That hospitals whose printed reports used a particular English font or Roman font were the ones most likely to be having high levels of COVID.
Shameek Kundu: Now, basic subject matter expert would say, of course, this is problem. But if you don't combine data science expertise with some very basic understanding of the domain you're talking about, then it doesn't work. So one area that I think even for the specialists who are working in this field is actually to increase the financial services quotient. So just as I would talk about the rest of financial services organization increasing their data quotient, I think the data specialists need to increase their financial services quotient or any industry that they're working. So that's one block of work where I think that will be quite helpful for data scientists, particularly those coming from outside the industry.
Shameek Kundu: But the second, as we both discussed, is how do you take the rest of the organization, [inaudible], to the facilities person? And I really like the facilities of real estate example, because you don't think of that person as the most data driven. But actually before COVID came in, a lot of the optimization of the facilities in a bank or insurer, right? Which rooms, how much energy are they consuming? It was very data driven. And I was super impressed because this particular area was part of my CIO functions thing. And I was super impressed by the fact that at Standard Chartered, at least, the real estate folks were one of the most advanced in thinking about it. And it's a great example because you don't think about the real estate person as the first person when you think of data literate. But guess what? They were extremely because their entire value proposition around how to make businesses, our footprint more eco-friendly how to reduce the space in a sustainable way.
Shameek Kundu: All of that was so fundamentally dependent on data. And I just state that as an example. Every single role from public relations to HR, to of course core banking and insurance roles, to even regulatory interaction, every single role needs to become much more data literate. You need to have people who can understand the opportunities and risks arising from data and algorithms. You need to know how to best use it to your advantage in your job, not in somebody else's job. You need to be able to ask the right questions when you're promised the moon by an internal or external sales person. Right? And look, it's a tough one, but my sense is this is a tougher challenge. If I take an organization like Standard Chartered Bank, I would expect, I don't know, data specialists will be probably in the low thousands of staff, right? And any organization, any bank of that scale, right? Let's out of a 100,000 people, you probably will have 5,000 people who are data specialists at best, right?
Shameek Kundu: But it's the other 95,000 people, if you do have 100,000, who need to become data literate. So arguably that's an even bigger challenge. And to be honest, I don't know if there's a well-proven template for it. I think it's a mix of some basic education with hands-on ability to play with data and being able to make it relevant for them. I think too often all of us data folks have gone into this attitude of don't worry, leave it to the experts. The wrong attitude, right? Yes, the experts will deal with it, but we also need to get the rest of the organization engaged so that ultimately even if it's not the best, 100% best way of dealing with the problem, maybe it's a 90% best way but it has much more chance of success because we've got more of them.
Trust in AI
Adel Nehme: And to your point, data literacy, it's one of the few ways that you can increase the adoption of data products that data teams are working on because ultimately data teams are creating data products for the functional business experts. And there needs to be a common data language to enable that conversation between both of them. I think there's an interesting connection as well between the trust element and the data literacy elements that requires the ability to understand and have intelligent conversations with the experts building these data systems. So I'd love to deep dive more into your current role at TruEra and the importance of trusting AI in financial services. You've mentioned trust as a major barrier to AI adoption and financial services. Do you mind walking us through why building trust in AI is such a difficult task for financial services organizations.
Shameek Kundu: Sure. I think maybe I'm going to ... I want to pick on your words and say, actually, it is not building trust which is a difficult task for financial services firms. It is the fact that they have a huge amount of existing trust level to defend, right? And losing that trust is both very easy and catastrophic to them. I mean, you could see how hard it could be when the previous financial crisis struck. If people lose trust in financial services, then that's the end, right? I mean, other than your doctors, your bank or insurer probably knows more about you than almost everyone else. Right? We've had instances in the past of people complaining about why their credit card statements went to their home address because unfortunately it contains certain expenses with another partner that the wife or the spouse was meant to know about, as an example.
Shameek Kundu: So sometimes your bank or insurer might know more about you than almost anybody else, other than maybe your doctor. If you can't trust your financial service provider to do the right thing by you, to treat your fairly, to protect your data from misuse, where does that leave your relationship with them? Right? So it's not that building trust is a new act. It is actually that there is a huge amount of trust to defend. And if you don't work, it can quickly, quickly fall on your head very quickly. And particularly with large organizations, the cost ... This is almost their one advantage compared to more agile, newer forms that are coming in. It is that while it's a well-established bank or insurer, and if I lose that trust it's just ... Particularly if you're using it for something high stakes, like deciding my health insurance premium or deciding whether I can get the loan or not, or being advised whether I should make this significant investment or not. In those areas, if I can't trust it, it just becomes extremely complex.
Shameek Kundu: And so it's a two levels. One, as an organization, I want my customers and indeed my regulators trust me. But actually there's a DNA of this kind of trust culture inside traditional banks and insurers at least, which means that even inside the organization, there are multiple layers of people who are obsessed about this, maybe sometimes too much, but you have to convince all of them that, yes, this thing can be trusted upon. And that is why this is such a big challenge.
Adel Nehme: And the task of creating responsible and trustworthy AI is even more exacerbated by the evolving regulatory landscape and financial services. Do you mind walking us through how the regulators are responding to the emergence of AI in financial services and what are the major concerns that need to be addressed?
Shameek Kundu: So the interesting thing about this question is how much it is evolving. So in the weekend I've had two new LinkedIn posts on this topic, just because two regulatory things have happened over the last week, frankly, one in Europe and one in China. But yeah, I think it is a fast evolving space. Now, financial regulators, as against broader technology data or competition regulators, they're actually being quite reasonable. They've been cognizant of all the risks involved and the need to continue to encourage innovation in this space. In fact, we at TruEra have been closely engaged with regulators in the UK and Singapore and many other countries on this topic. Several regulators started off with early guidance, minded guidance, not regulation, regulatory guidance as against regulation, right? It means it's more mandated. So regulators in Singapore, back in November, 2018, Hong Kong, one year later, Netherlands, Canada, Bank of England. So Bank of England hasn't published, but these others that I mentioned, they've all published specific guidelines on the use of AI in financial services.
Shameek Kundu: The Bank of England and financial conduct authority in the UK have formed a consultative forum of which we are a part. US banking regulators have sought massive major industry comment on a wide ranging set of questions around AI risks. That was in March. And then, of course, over the weekend or just before the weekend, the securities exchange commission asked a specific question about how online retail brokerages and how that kind of investment, online retail investments, the use of algorithms to encourage certain behavior there. That's been another area of interest.
Shameek Kundu: So far largely the focus has been on guidance and on concept rather than new binding regulation. But I do think in the next six to 18 months, depending on the geography, we're probably going to see regulators becoming more explicit in their expectation of industry players. Now this need not be a bad thing because, frankly, leaving a lot of guidelines and lots of things to discuss is in some ways uncertain. Getting regulatory uncertainty need not be a bad thing, right? Now in terms of what they care about though, it's actually quite clear now. Explainability, fairness, explainability both internally, meaning if the organization wants to understand what's happening as well as customer facing or external facing transparency. So that's clearly big area. Fairness, so preventing unjust bias or unfair outcomes, stability of the model. So making sure that the model won't break the first time data changes dramatically. And of course over fitting, which is a particular issue with machine learning models.
Shameek Kundu: So these are quite clearly key areas of focus. There's also some broader aspects, which is what's spending a little amount of time on. Actually, often financial regulators are not just worried about whether the model and the data has been managed well. They're worried about, back to that point about rest of organizations, AI quotient improving. They're worried about whether the rest of the organization has fully understood the risk and reflected it in the actual business. So for example, most recent piece around the use of digital engagement in online retail investments, the question is not whether the use of AI to encourage someone to buy a product is illegal. It's not. As long as the product itself is not illegal, you can use any kind of algorithm to encourage them. What is potentially bad is to sell someone a stock or bond that was wrong for them given their risk profile.
Shameek Kundu: Now, if you're a human advisor, you know this. This is your day one piece. You know if Shameek is a medium risk person, you do not sell that guy on a complex structure hedging product. But the algorithm, if you don't tell the algorithm, the algorithm doesn't know it. So there's a difference between governing a model and a machine learning model for fairness, for explainability, or et cetera. That is actually relatively well understood. What is more complex is how do you make sure that every element of what a bank or insurer or any fintech is doing, they are thinking about the indirect implications of using AI in that space. That is the more complex one. I think that will evolve even further over time.
Adel Nehme: And how do you see it evolving? What's the charter you can propose for a bank, for example, to evaluate its risk correctly.
Shameek Kundu: This is a really good point because actually many banks have struggled with this, banks and insurers have struggled with this. But I think the easiest way of thinking about it is you don't actually need a large number of data or AI experts. You need some data experts, but most banks and insurers have them. What you need is, frankly, at least my personal experience has been, if you spend a lot of effort educating the rest of the risk folks, whether it's a credit risk or market risk or compliance risk or reputational risk person, if you make them aware of how the machine learning models work, what could be the risks, et cetera, actually, they're much better at working out what it might mean.
Shameek Kundu: So the first time I spoke to somebody who was managing financial markets risk, this guy immediately picked up on what I was saying and say, "Oh, okay, hold on. There might be an impact of this online competition policy." I said, "What does that mean? How's that possible?" So imagine if my model and another model in another bank are engaging in rigging [inaudible] prices, that could be a problem. It's like, I hadn't thought about it as a data and AI person, but he immediately thought about it because he's thinking about the intrinsic risk. So I would say that's probably the approach. Don't try and get one central team to try and predict everything that will happen. Disseminate the knowledge of both the opportunities and the risks from AI across the broader community who are well-placed to deal with each of these risks and then let them internalize it and figure out how AI will impact them. And I think that is a better, more federated approach than trying to say one person or one team in the bank or insurer would somehow become the master of all AI risk in the bank.
Model Interpretability and Explainability
Adel Nehme: And this is where data literacy comes into play as well. Because if the business expert does not understand the limitations of machine learning or how a data system in general operates, they won't be able to make those assessments. So something TruEra specializes in is model interpretability and explainability. Can you give us an overview of the state or what is possible today in model interpretability and explainability, especially given how niche it is of an aspect in machine learning research?
Shameek Kundu: Yeah. Talking about how it's initial aspect, one of the reasons I joined TruEra, I mean, initially I became a client and then joined them, was I heard that the founder, Professor Anupam, first started researching this in 2012. I don't know about you. I had not heard of model explainable interpretability back in 2012. So I was thinking the guy has been researching it this long, he must know what he's doing. So anyway, to your question, where is it? I think, first of all, it's worth thinking about the difference between what some people call inherently explainable models versus the post-factor ones, the ones where you can only explain after your in. There is a school of thought which says maybe in certain very high risk areas, you should only use inherently explainable models.
Shameek Kundu: However, there are two concerns that I personally have with them. One is there are several areas, including image and text related pieces where non-structured data where inherently explainable models just have not got that far yet in performance. It might change over time. But I think the bigger problem is it might sometimes create a false sense of comfort because you might get a very well-explained but extremely complex inherently explainable model. But if you haven't synthesized that to a level where it actually makes sense to human beings, it's actually giving you a false sense of comfort. So therefore I think there's a role for inherently explainable models to play and there's a role for what is crudely called post-talk explainability.
Shameek Kundu: And I think there's been a lot of progress with reasonably accurate levels. So at the level of saying this explanation will be X percent accurate, Y percent of the time, it's certainly something that is possible now for most types of machine learning models, particularly in the phase that we are, not just TruEra's own technical QII, but that whole game theory based approach to explanations. They're reasonably powerful, right? And they've been seen to work, particularly if you ... The science is not behind the explanation of ... Sorry, the implementation of that science is trickier. How you implement those explanation techniques, which can be resource intensive, et cetera, is more complex than the science itself. The science is reasonably mature now.
Shameek Kundu: Now there is still some work needed when it comes to deep neural networks. And we at TruEra have actually recognized this and we've created an open source initiative called TruLens, it's kind of trulens.org. It's not part of the TruEra product at all. It takes what is there in open source today add our own. We just kicked off. It's got some fantastic reviews in the last 10 days. Precisely people recognize that in deep neural networks there's more work to be done. So we're opening it up on a commercial product.
Shameek Kundu: But I would argue that the area where we need the most attention, because we did some review of this on behalf of a MAS led consortium here in Singapore, is actually the form of the explanation that is presented to the end user. So whether it was you or me as a customer, or whether it is the vast majority of people in a bank or insurer who are not deep data science folks, just giving them the raw explanations is useless, right? You have to find a way in which it resonates for them. And when we looked at the research in that, which is, let's call it the human machine interface for want of a better word, there is certainly more work to be done, right? Both in terms of how you explain a decision to an end customer that makes a difference, but something as simple as, oh, there's a human in the loop for a decision so we shouldn't worry about it because every AI decision, a human being is reviewing it and deciding whether to accept it or not.
Shameek Kundu: Well, has somebody looked at the psychological effect of someone continuously expressing yes, yes, yes, yes to the algorithm's recommendations? Then the one time it should say no, it probably overlooks it because he's seen the last 99 cases where the algorithm was right. The human got wrong the hundredth time when they should have intervened. Now, this is not a theoretical point. As you know, some of the issues with the Tesla self-driving car are exactly that. The human has kind of assumed that the ... So this area, it's not so much directly the signs of the underlying explainability, but it's the form that that explanation and that human machine interface works. I would say that needs more work than the underlying science, except in deep neural networks where I think there's more work to be done.
Adel Nehme: This is a great overview, and especially when you mentioned the packaging of explainability. Do you find that there are categories of AI or machine learning use cases where trustworthiness and interpretability are more important than others within financial services? If so, what are the different measures organizations need to apply from a trust or risk perspective based on these different levels of risk?
Shameek Kundu: I think your first question is easier to answer than the second one, because one of the things around the current state of regulation and regulatory initiatives is people have not yet gotten around to creating standards for I will use X for Y category yet. That's one of the areas where I think financial institutions need to do more work over time. But on your first question, yes, there are very clearly areas. Not every area is equally risky when it comes to trustworthiness and interpretability, right? So the ones that tend to have the maximum attention are those that result in somebody being denied a service of any kind, and particularly in this instance, of course, a financial service, right? So what could this be? Well, you're not getting an insurance package cover or getting an unacceptably high rate because of AI not behaving well or because of some question about the AI, or similarly not getting a loan or even not getting accepted as a customer because your KYC risk seemed to be too high.
Shameek Kundu: So anywhere where you are in a position of being denied access to a financial service is probably the highest, right? Next highest would be where there's room for you to have a poorer experience or a poorer outcome than some other group. That is discrimination. But the first one is not discrimination. It is absolute lack of access. So even algorithm could be used to, for example, deny someone the right to come and collect money from their own account at an ATM, i.e. facial recognition. Then you better have very good safeguards there where the person has an alternative, right? Because otherwise they're just not getting access, nevermind discrimination. I'm just not getting access.
Shameek Kundu: So I would say right at the top is risk of denial of valid access. Next would be risk of discrimination, unfair discrimination between parties. I think another one that is very high is where do you have to prove to a regulator that you have done the right thing. Now, when you have to prove, most famously in financial crime and anti-money laundering sanctions related stuff, as well as let's say, insider trade behavior, et cetera, this is not about customers, but you have to prove, the onus is on you to show that you've done the right thing in terms of detecting and catching and investigating. If you can't prove it to the regulator in this case, not to the staff member or the customer, but to the regulator or to your internal risk management teams, that becomes difficult as well. So these, I would say, are quite easily things that you have to care about because you're impacting a customer or it's impacting your direct regulatory obligations.
Shameek Kundu: Now, the tricky ones are the ones which don't fall in these areas, right? So I was really interested in how the Chinese regulator, not a financial services regulator, but they came up over the weekend or on Friday, I think, with a bunch of regulations on what they call algorithmic recommendation systems. And it's not specific to financial services. But normally we don't think of ... Sorry, there was this and there was the SEC thing in the US about Robinhood style online investment tools. And both of them, it struck an important thing in my head, which is normally I would treat marketing use cases as, well, what's the harm it can do? Worst case, it will sell something to you, you don't want to buy. Fine, go away. Right? It's fine. It's not a big problem. But actually, you can, of course, miss-sell. You can inappropriately target a financial product at somebody who should not have been sold that product in the first instance.
Shameek Kundu: Imagine selling a very high interest loan to somebody who the machine has worked out is desperate for money. Is that fair? Is that ethical, right? It's nothing to do with denial of service or discrimination, but it's just wrong to offer that person that loan. You might want to offer them some other kinds of help, but not high interest loan to address that.
Adel Nehme: It's predatory. Yeah.
Shameek Kundu: It's predatory. Exactly. So these are the ones that I mentioned at the start, denial of service, discrimination, explicit regulatory. I think others where it's kind of, well, I'm offering a service, but actually where is the line between offering a service and offering something inappropriate or predatory? That is a trickier one. But I think most banks are working that out. So one of the first things most banks and insurers are doing is coming up with a materially different framework with a bunch of conditions. Now, to your second part of the question, have we got to a stage where people have said for this level, we need this explainability, for this level, we ... I haven't seen it yet. I think that's part of work in progress.
About TruEra Inc.
Adel Nehme: I think for a lot of these use cases, having a human in the loop will also be very important for this decision making. It's all about packaging that information as well. And this is a great segue to discuss TruEra who said packages explainability and interpretability very, very well. Do you mind sharing how TruEra solves some of these fundamental issues you're discussing?
Shameek Kundu: Yeah, sure. And thanks again for the opportunity. So two things, first of all, we just do two things. We provide software that does, one, it allows somebody who's just built a model to assess that, the quality of that machine learning model. Now, when I say quality, traditionally, as you know, data scientists and their business stakeholders will often think of quality as accuracy against test data. But we're talking about that, but also a whole host of other things, such as the potential for unjust bias, such as overfitting instability and the comparison with other models to show that. So it's not just about how accurate is it against the train and test population, but also what are the known weaknesses of machine learning models around overfitting instability on just bias, et cetera. And have we checked for that? So that is like quality diagnostic tool.
Shameek Kundu: Now, it's primary purpose has, indeed, in financial services very much been about not doing the wrong thing. But actually when you use it, it also helps you do the right thing, if you see what I mean. I mean, no one goes out and says, I want to build an overfitted model or I want to build an unstable model. So actually even if you improve the quality for defensive purposes, in reality, you are improving it just for the sake of it. You're actually improving the quality of the model. So that's one part, it's like a model diagnostic kind of tool. The other is monitoring. So once you've gone live with the model, we allow you to monitor the output as well as the input into the model in a way that there's a meaningful connection. Meaning when you see a model drift or data drift, you're able to quickly go back to that diagnostic model and see whether this is a material drift, and if so, what is causing it? Right?
Shameek Kundu: So that connection between monitoring back to the diagnostics is quite important. In terms of how we are deployed, we're not an end to end ML ops. So you don't develop models on TruEra. You don't use to TruEra to train models, you don't use TruEra to deploy models. And that is intentional because our whole proposition is, it doesn't matter whether you used platform X or Y or Z, or you just did it out of a notebook, or you bought a model from somebody. It doesn't matter how that model came about. We just need access or rather you just need access to the model's output in a pickle file or something like that, a serializable output, and the training and test data. And with that, TruEra will be able to give you all these AI quality diagnostics and monitoring for that matter, as long as you have access to that model.
Shameek Kundu: So this is quite important because it means that are able to, for example, compare an in-house model with something you bought from outside or a model that one team has built using one platform and another has built using another platform. So if you want to build consistency and standardization across an organization, short of forcing everybody to build models on the same platform, this is a good way of kind of ensuring some degree of consistency, right? So that's how we work. And we work on the client's premises largely at this point, at least certainly for large enterprise clients, meaning either on premise or on the client's own cloud environment. It's not as a service. And that is intentional because for the kind of clients we're talking about right now at enterprise level, nobody's willing to share confidential things like models and data with us. So it's entirely on the client side. Nobody from TruEra will ever see what you're doing with the software.
Adel Nehme: It's very exciting to see the evolution of the software stack around explainability evolve. I'd love to pivot to discuss the future of data science and financial services with you and how it intersects with explainability, right? How do you view the industry, for example, accommodating large language models, GPD three, especially given its black box nature and the fact that from a packaging perspective, it's ready to use, plug and play API rather than a traditional machine learning model that follows the fit predict paradigm, for example?
Shameek Kundu: So I haven't come across financial service client already using GPD three, obviously, but there are examples similar to it. For example, facial recognition models are used not for emotion recognition, but certainly for authentication of identity, et cetera. I would say there's a whole range of techniques. At one extreme, the lowest bar extreme would be, well, I've got the testing results of the person who's built the model and created it. And I can see that they had X percent accuracy, X percent precision, Y percent recall. And I'm happy with those kinds of words, right? So don't need explainability because I'm happy with the guaranteed performance. And why am I happy with that? Because particularly for facial recognition, as you know, there are regular tests done with NISD and so on, where you are able to get that kind of information.
Shameek Kundu: So that's one level of explainablity where you understand how the model works and you have a very good understanding of the training and test data, as well as the most recent performance of that model out in the wild sort of thing. So you could accept that. And some use cases is accepted often with significant human intervention. So in this case you might say, well, it's okay, but we want 100% of the refused identifications to go to a human being straightaway so that no customer is unfairly, even if it's 0.1%, I don't want any customer to be unfairly rejected. So whenever you're rejected, it goes to a human being. That's one way of addressing it. That's back to the human intervention piece.
Shameek Kundu: The next level would be to say, actually, no, this is not enough. I want the provider to at least provide me a level of generic explanations on how the model is ... Sorry, specific explanation of how the model is working. Maybe not for my client's data, but for the data you have trained on. Right? Now, this is where some of the deep neural network kind of explanation comes in where you're not saying, ah, okay, for X, Y, Z bank, this is all the model is working. You were saying as a whole, this is how GPD three or this particular facial recognition model is working. And I do think this will become important over time because, while I was testing this at one of the banks or while somebody was testing this at one of the banks when I became aware of it, the model providers explicitly said, if you wear a mask, the model will not work, which in the middle of COVID was a relief because if it started working with mask and without proper authentication, it would be a problem. And lo and behold, somebody found a picture of somebody being authenticated with a mask.
Shameek Kundu: The model provider had no idea how that happened. This points to the fact that simply depending on past tests and training data will not be enough. People might be forcing you to say, okay, you don't have to explain to me how GPD three or this facial recognition model will work in the context of my bank or my insurer, but you do need to give me a reasonable level of explanation more broadly, right? And that brings me to the third level. I think the only way this kind of model can be governed safely is, frankly, if you think there are precedents of this in banking for sure, if you think about a FICO model in the US, credit model, or even some of these big Moody's and so on, a Bangor insurer or investor doesn't go and challenge every single Moody's rating. What does happen is that a regulator is checking whether Moody's is doing its job.
Shameek Kundu: So I do think that might be a sustainable way where you say, if a model is used very broadly across the industry, right? Whatever industry that is, then that industry's regulator might regulate that model. So in other words, the model provider will privately with the industry's regulator explain their model and continuously convince the regulators saying, yep, you can depend on me, right? That way every bank and insurer in fintech doesn't have to individually check, if you see what I mean. So I can see these three kind of ways of handling it. Most of the current usage is certainly in the first category I mentioned.
Future of Machine Learning and AI in Finance
Adel Nehme: Given emerging explainability techniques and methods and innovation in this space, what do you think our machine learning or AI use cases in financial services that will be operationalized tomorrow that we can not operationalize today?
Shameek Kundu: I think it will be a matter of scale rather than completely new things, right? I mean, there are people who are making credit decisions using machine learning, maybe in a challenger mode, maybe in future, they'll not use the challenge mode, right? Or facial recognition becomes more standard, not for emotion recognition, but for authentication. So I think you will see, if you take one or two years, I think the difference not be brand new use cases, it'll be much more of how deeply it is used without a lot of manual intervention. That I think is going to be seen. So some of the so-called higher risk areas might become acceptable risk, whether it's credit or some aspects of pricing and underwriting support and insurance, et cetera.
Shameek Kundu: I think we'll have two or three other interesting topics, which I do see evolving over time, if you want me to do a bit of crystal gazing. I think one is, there are some very hard problems to crack that are coming up in financial services. And one of them is, frankly, actually just dealing with data and data rights when you have so many different parties collaborating, right? Different parties in the third party, Telekom, retailer, search engine, and e-commerce. Being able to get some assurance that the data is not being misused, that itself is a big challenge. And I think that is a bigger problem to solve than algorithmic problems. And to that extent I think there will be progress there.
Shameek Kundu: And then the other big nut to crack is, of course, ESG and in particular, the environmental obligations of banks and insurers. It is a frightfully difficult thing to do right now. And I do think whether it's AI, but more, it's like the role of data and big data in the truest sense of the word to really ascertain that these things we're financing, we are doing is green, right? Or acertian the non greediness of an existing investments, for example, right? In a way that is realistic and is verifiable. That is probably going to be one of the biggest opportunities. This is not one to two years, but I would say if I were to put money into something, I would put into that, companies that are making the use of data and AI quite fundamental to how the financial services obligations around environment are met and around individual's data rights are met. Those would probably be it. But in the use cases I talked about, I would say it's more a question of higher scale rather than brand new use case.
Adel Nehme: And from a skills perspective, what do you think will be the most essential skills for financial services professionals in a future where AI and analytics are operationalized and are part of the daily workflow and what would be your advice for up and coming data scientist, fielding choices between, for example, joining tech or finance here?
Shameek Kundu: So let me do the latter, because it's certainly an area close to my heart. Look, there are many industries where certainly if you're starting with a tech first company, yeah, you can do cool stuff with search engines and e-commerce recommendation engines, et cetera. But, A, you're getting to play with a very different kind of data and a lot of data, which is more substantive in some ways than your day-to-day behavior on social media, because you do lots of things on social media based on emotion, and it might be something transient. But your financial history over 10 years is over 10 years. It tells you a story of what you've been. So if you say data scientist really is interested about how to use data for better decisions, the richness of the data and financial services is far more. But more importantly, I think, and you can argue that for financial services as well as for some other sectors like healthcare and maybe environment and so on, the value of what you will do with that machine learning model is arguably much higher, right?
Shameek Kundu: I mean, it's fantastic to have a great translation tool. It will solve lots of pain areas for lots of people. And I think in the case of one, I think it was a Belarusian athlete who used Google translate to find a way out of Tokyo airport, it can even save lives perhaps. But for the most part, a lot of that is about making life easier versus cracking fundamental problems like financial inclusion, et cetera. So you'll also be solving bigger problems in financial services. That's why I would say it's fascinating. You should give it a go. However, now coming to another question on what skillsets, so I'll answer the question in two parts. What would I suggest from a skillset perspective to data specialists? Bluntly, learn about your sector, whether it's healthcare or whether it is financial services or transport or whatever it is. Because anyone who believes that data alone will rule without context is missing it and I would strongly recommend them reading the COVID-19 experience, for example.
Shameek Kundu: So being able to be someone who understands the broader context and knows how to work with, let's say an actuary in an insurance or a marketing person or a credit person in a bank, et cetera, that's super helpful. So that would be my advice to the data specialist. To the rest of the bank or insurer or fintech people, I only have one piece of advice. Every aspect of what you do in a bank, insurer or fintech is going to be impacted, right? Some of you might be young enough or curious enough or smart enough to rebuild your careers as a data specialist, but most of you will not. And you might not even want to. You might just like being a marketing person or a learning person or whatever you want to do. You don't need to become a data scientist yourself, but you need to absolutely become sufficiently smart about the topic sufficiently [inaudible] to ask the right skeptical questions. If you can't ask the right questions, then you're dead, right?
Shameek Kundu: I would say build up to a stage where you can ask the smart questions. Don't try and "I know how to code." It's good. If you know how to code, it's great. It's probably more important to understand how that code works and how data is used to train models than to actually be able to code yourself. So get to the place where you can ask the right questions would be my suggestion.
Call to Action
Adel Nehme: That is awesome. Finally, Shameek, given that we're ending on such an inspirational note, do you have any final call to action before we wrap up today?
Shameek Kundu: Yeah. This is for the financial services sector itself. I think, look, as we discussed, there are some real interesting opportunities on the horizon for financial services when it comes to using analytics and AI and data and machine learning more broadly, including on the environmental side that we discussed. There's also a very real risk of an AI winter, again, coming up, certainly in financial services. I mean, people have put in billions of dollars, certainly hundreds of millions of dollars in many major banks and insurers. And I think if we don't watch it and if we just keep focusing on how many new marketing slogans can we generate and not actually seeing whether it's making a difference, there's a real risk that this will become yet another AI winter. So I would say let's be cognizant of the opportunity and let's focus on making real difference with machine learning and with data and analytics rather than going for the hype.
Adel Nehme: That's great. Thank you so much, Shameek, for the insight. We really appreciate it.
Shameek Kundu: Thank you.
Adel Nehme: That's it for today's episode of Data Framed. Thanks for being with us. I really enjoyed Shameek's insights on the state of AI adoption in the financial services industry. If you enjoyed this podcast, make sure to leave a review on iTunes. Our next episode will be with Brian Campbell, head of engineering at Lucid Chart on managing data science projects effectively. I hope it will be useful for you and we'll catch you next time on Data Frame.
← Back to blog