Operationalizing Machine Learning with MLOps
Adel Nehme, the host of DataFramed, the DataCamp podcast, recently interviewed Alessya Visnjic, CEO and co-founder of WhyLabs.
Adel Nehme: Hello, this is Adel Nehme from DataCamp and welcome to DataFramed, a podcast covering all things data, and its impact on organizations across the world. You know, one thing we've definitely seen the rise of over the past years, the pressing importance of operationalizing machine learning models and production.
Adel Nehme: COVID-19 exposed us to concept or data drift with many models breaking in production due to the evolving nature of data being generated after forced lockdowns and completely different habits that emerged because of it. As such, we've seen the explosion of new methodologies and ways to describe operationalizing, monitoring, and extracting value from machine learning and production. Chief amongst them is MLOps.
Adel Nehme: This is why I'm so excited today to have Alessya Visnjic on today's episode. Alyssa is the CEO and co-founder of WhyLabs, an AI observability company on a mission to build the interface between AI and human operators. Prior to WhyLabs, Alyssa was the CTO in residence at the Allen Institute for AI, where she evaluated the commercial potential for the latest advancements in AI research. Earlier in her career, Alyssa spent nine years at Amazon leading machine learning adoption and tuning efforts. She was a founding member of Amazon's first machine learning center in Berlin. Germany.
Adel Nehme: Alyssa is also the founder of RSquared AI, a global community of 1000 plus AI practitioners who are committed to making AI technology robust and responsible for all. In this episode, Alyssa and I talk about the machine learning challenges data teams face that spurred the need for MLOps; how MLOps intersects and differs from other terms such as DataOps, model ops, and AIOps; how and when organizations should get started with their envelopes journey; and what the most important components of MLOps are and more. If you want to check out previous episodes of the podcast and show notes, make sure to go to www.datacamp.com/community/podcast.
Adel Nehme: Alyssa, it's great to have you on the show.
Alessya Visnjic: Absolutely. I'm really excited to be in the show as well, Adel. Thanks for having me.
Adel Nehme: I'm really excited to talk to you about MLOps, how organizations can get started with it, your experience launching WhyLabs and more. But before, can you give us a brief introduction about your background and how you got into the data space?
Alessya Visnjic: So my journey began at the University of Washington where I studied applied math. I graduated with a degree in 2008, which was really funny time; a lot of unknowns similar to now, maybe. And I was really keen on going into grad school and getting a PhD but decided to check out industry before I dive into that because of uncertain times. So I joined Amazon, which in 2008 was a pretty not well known company. In fact, my grandma commented that I went to work at a bookstore that doesn't even have a store.
Alessya Visnjic: But the cool thing was that it was at a very interesting time. So the company was growing rapidly; the technology was growing rapidly. So I was very fortunate to join Amazon at the point where the whole DevOps culture was just exploding in the company. And I worked at a group called Platform Excellence, where I ensured that the website is running fast. It doesn't have anomalies, it doesn't have issues.
Alessya Visnjic: And halfway through kind of my career there, I was always looking for opportunities to get back to my applied math roots. And in 2013, when Amazon started kind of their first R&D, machine learning R&D, effort. So they started creating R&D teams internal in business units. I joined one of them and I moved to Berlin, Germany to kind of help seed the group, help start the group.
Alessya Visnjic: The group was focused on deploying machine learning applications internally in the company. So there I was very fortunate to kind of end up in the right place at the right time. There were a lot of brilliant machine learning scientists that joined the group. And we were tasked with building and deploying machine learning applications into supply chain personalization, into warehouses and so on.
Alessya Visnjic: So I had the opportunity to essentially become the bridge between machine learning engineering and the business. And in that role, because of my kind of DevOps experience prior to that, I also was the brave person who would often carry a pager, supporting the machine learning applications that we deploy. So I had the pleasure and the adventure to kind of feel what it's like to run machine learning production very early on for really big deployments.
Alessya Visnjic: So the biggest deployment that I've ever operated was a demand forecasting pipeline that generated daily forecast for a quarter of all Amazon global retail, which was mind boggling. We were creating a forecast for hundreds of millions of products on Amazon and supporting that in production was a big endeavor. So that's kind of how I got into the space.
What is MLOps?
Adel Nehme: I'm excited to discuss all the latest on MLOps with you. So while MLOps is a relatively nascent term, it has been a lot of the focus for many organizations today. And we're currently in what can be described as a hype phase MLOps. I want us to discuss what MLOps is and isn't.
Adel Nehme: But first, what I would like to do is set the stage and articulate the motivation for MLOps and why it's important. So you're someone who's been in the machine learning space for a long time now. Can you walk us through the specific challenges data teams face when deploying machine learning models at scale that really spurred the need for MLS ops?
Alessya Visnjic: That's a fantastic gateway into the subject. So at first, kind of to set the stage, I kind of want to take a step back and think about where the term came from and unsurprisingly mirrors the DevOps culture that we have in traditional software. And really what DevOps is all about is continuous integration, continually delivery, continuous deployment.
Alessya Visnjic: And when I say these terms, what stands out is continuous, the term continues. And here, what it really means is that we want software to be a continuous experience to the customers, continuous and consistent experience, which seems very simple and kind of a basic definition of software. But in the machine learning world that is just becoming established.
Alessya Visnjic: So in the machine learning world, when we look at machine learning models in academia, typically we take a data set and then we iterate on it to come up with better and better models. Where in the industry, you have a model, it's deployed in production, and it's being hit with newer and newer data every day and it needs to provide a continuous, consistent experience to the users. So really what MLOps about is tooling culture processes that help ensure that the machine learning models continuously deliver the experience that they were designed to deliver.
Adel Nehme: Okay. That's great. So over the past few years, there've been quite a few different terms that try to capture the operational side of data in machine learning work terms such as DataOps, AIOps, and now MLOps. Do you mind walking us through how these terms differ and intersect?
Alessya Visnjic: Yes, our industry is drowning in new terms. And to answer your question, let's first focus on the difference between AIOps and MLOps. That's the easiest one to define. So AIOps is a term that is being used to describe AI and machine learning techniques that are applied in common IT issues. So say automating anomaly detection and click stream data or automating kind of monitoring and aggregation of alerts in a very traditional sense. So companies like Datadog and Splunk are applying AI and machine learning the log data, the machine log data that they're collecting to identify anomalies.
Alessya Visnjic: MLOps, even though the difference is only AI and ML. And I do not know the history of how these terms came to be so similar yet so different. MLOps is a set of techniques, and not necessarily techniques that are related to AI and machine learning, but techniques for operationalizing the machine learning and AI applications.
Alessya Visnjic: So these techniques could be data versioning, reproducible data pipelines, logging and monitoring for data and so on. So AIOps and MLOps are very different things. Now, if we're talking about DataOps versus MLOps, the differences a little bit less straightforward. So DataOps typically focuses on data pipelines, again, data versioning, data warehousing and kind of data robustness, while MLOps focuses on machine learning pipelines and machine learning applications.
Alessya Visnjic: Now, the challenges that machine learning pipelines are basically data pipelines. So where DataOps begins and ends and where MLOps begins and ends is still a little bit a gray area, let's say.
Adel Nehme: Okay, awesome. And in summary and please correct me if I'm wrong, MLOps can be considered as a set of tools, practices, techniques that ensure reliable and scalable deployment of machine learning systems, similar to DevOps in the software engineering space. Is that correct?
Alessya Visnjic: Yeah, I think that's a great definition. I would also add culture to that. So tools, practices, techniques, and mindset or culture.
What are key focus areas that make up a successful MLOps practice?
Adel Nehme: Yeah, I'm definitely excited to talk about culture with you. As we all know, that's what really differentiates an organization that makes the most of its data from the ones that don't. So I want to zero in on some of the best practices and MLOps that are still emerging. What do you think are some of the principles, patterns, or key focus areas that make up a successful MLOps practice?
Alessya Visnjic: Yeah, well, so I would say going back to kind of the term or pattern that I highlighted previously from the DevOps practice, so the concept of continuous delivery, continuous deployment, continuous integration, that pattern basically asks for an application to be consistent, to be reproducible, to be robust. So day in and day out, the experiences that the machine learning model is generating for the customers need to be continuous and consistent.
Alessya Visnjic: And that's fairly challenging to do in the machine learning world for a few reasons. I think the biggest reason is machine learning applications, unlike traditional software, they're not heretical. Traditional software essentially is a set of rules that we produce and when customer comes in there, they satisfy one of these rules and they get some kind of experience. With machine learning, the rules are generated by the data that the machine learning model has been trained on.
Alessya Visnjic: And so it's rather than having rules, the machine learning models are essentially building the rules by looking at the data. So that creates a few interesting challenges. First of all, the data really sways the experience. So the machine learning model behavior would be different, even if no code changes have been made, even if no config changes, no deployments have been made. If the data that comes in as an input to the model is different, if the patterns are different, than the experience would be different.
Alessya Visnjic: One of my favorite examples of that that I've heard recently is speech recognition software that is deployed out in the wild, in the physical world that recognizes certain words or phrases that are spoken. When COVID started and everybody started wearing masks, the software started working much less accurately, much more poorly than before because the mask kind of covers your mouth, which makes your voice come out not as clear, which makes the microphone pick your voice up not as clearly and that essentially breaks the model.
Alessya Visnjic: So that's a very concrete and very unique behavior to machine learning applications, so the data that comes in into the application sways the model. And I think that's something to keep in mind when you're defining those practices for successful machine learning operations. I would say in summary, kind of high-level patterns that are merged from the desire for continuous behavior are reproducibility, which is a very kind of common and important concept and pattern to machine learning applications.
Alessya Visnjic: So reproducibility means I ran an experiment on some data on my local machine or on the development environment. I have this really cool model; it's better than my previous model. Now I want to reproduce it in a different environment. How do I do this? Or I have this model running today exerting certain behavior, how do I make sure that it's exerting similar behavior tomorrow? So that's reproducibility, that's one important pattern.
Alessya Visnjic: Then robustness comes to mind. So robustness essentially means of my models and production it's generating predictions, but 30% of my users are unhappy with the predictions, doesn't mean that my model is failing. What is failing mean for my model, what are the failure modes, and what does failure even mean in the probabilistic system? So answering those questions is kind of the realm of the robustness pattern.
Alessya Visnjic: And then finally transparency. So you have a data pipeline, it's crunching gigabytes or terabytes of data, the predictions are coming out. How do you investigate undesirable model behavior? How do you explain the predictions? How do you debug the predictions that are being made? Transparency as a pattern that kind of captures some of these questions, tools, and culture and mechanisms. So reproducibility, robustness, and transparency I would highlight as three key patterns.
Adel Nehme: That's awesome. So we've seen in the past few years, this shift from experimentation to operationalization. What do you think are the main roles that are responsible for operationalization? Is it the same data scientists who develop the model? Is it engineers, hybrid roles like machine learning engineers, or even specialists roles like MLOps engineers?
Alessya Visnjic: Yes, actually there is a very recent development of a brand new role that I think has been defined in the last three, four, maybe six months of an MLOps engineer. So I like to call everybody involved in building and operating machine learning and AI applications, I like to call collectively all of those people, AI builders. And I think among them are many different roles.
Alessya Visnjic: So it starts with, obviously, the researchers that are building the models, but also the people with subject matter expertise that are defining the scope of the problem. So typically, if we're talking about a personalization model, then it's often the marketing team. If we're talking about a supply chain application, AI application, it's probably in stock managers. So it starts with the data scientists and the subject matter experts. And as the development of the model, the life cycle of the model goes on, typically there is a PM that's involved, there's an engineer or an engineering team that is involved. There's probably a QA team that's involved.
Alessya Visnjic: And eventually, I think the ownership of the model once it's built and deployed and isn't production, think historically it's been owned by the team that built it, so a mixed team of data scientists and engineers. I think today we see an emergence of an actual MLOps engineer role, and there also our ML engineers that kind of come into the picture that typically deal with data pipelines and actually the production realization step of the model.
Alessya Visnjic: And today I see a huge emergence of the MLOps engineers. In fact, yesterday, I was doing a little prep for the podcast, and I went on Indeed and I searched for MLOps; it's just a fun thing that I do occasionally to see what's happening in the job market. And I found that there is a huge number of companies who are trying to hire somebody with MLOps experience, companies like Peloton and McDonald's, traditional companies that are not even kind of well-known for AI, but they do deploy AI machine learning internally. So they're looking for somebody of that profile to essentially maintain machine learning models once they are in production. So I think that is the new emerging role that we're starting to see.
Are data scientists are geared towards becoming more and more specialized?
Adel Nehme: So do you think in that sense that data scientists are geared towards becoming more and more specialized?
Alessya Visnjic: I don't know if we can say with certainty yet. So I've seen both organizational structures be successful, so Eric Colson from Stitch Fix is one of the thought leaders in the discipline of full stack data scientists or full stack ML practitioners, which basically means just like full stack engineer. The data scientists as able to build a model, deploy the model, operate the model, and kind of own the entire cycle. I think that is still the case in many organizations.
Alessya Visnjic: But at the same time, we do see this parallel of very specialized rules, data scientists who are machine learning scientists who actually develop novel techniques, data scientists who develop novel techniques with respect to data, and then we see machine learning engineers who typically productionalized or scale-out the machine learning application. And now we see MLOps engineers. So there is a sign that kind of both of these, I guess organizational structures, have a place to be. And I would say we're yet to decide which one is better or more efficient or maybe both have space to be.
Adel Nehme: So in that sense, given that MLOps is still fairly emerging, what do you think is still needed until we have like this massive wide adoption of MLOps as a practice and organizations today?
Alessya Visnjic: Yeah. I think we are still ways away. One of the biggest barriers to adoption that I see is culture and kind of understanding of the need for MLOps, understanding of the need for tools and practices and processes and culture for operationalizing this technology. Typically, what I still see talking to various data science teams is their thinking kind of ends at production.
Alessya Visnjic: So they're running this marathon and it starts with getting data, capturing the problem, building the model, iterating, experimenting, productionalizing, and it kind of ends when the model is in production. So that is I would say kind of the most common thinking that we have been seen in an organization and MLOps is the movement that essentially says, "Look, you have to think about what happens to your model post-production because that is pretty much the most important step."
Alessya Visnjic: Once the model is in production that's when it's delivering value or not delivering value, that's when it's delivering positive customer experience or negative customer experience. So thinking about what happens post production is probably even more important than other previous steps. So I would say kind of realizing that this is important is one of the biggest barriers.
Alessya Visnjic: Creating this culture, helping the businesses understand that there's need for that because a lot of organizations are still trying to adopt machine learning, trying to launch their first machine learning models. And they're spending a lot of resources on that hiring specialized data scientists and machine learning scientists, and they're really excited to get to their first model, but then oftentimes they're surprised that there's still more resources and still more work needed. So I think changing that mindset is very key.
Alessya Visnjic: And then I think as far as MLOps from the technology perspective as a software category goes, I think we still are figuring out a few things on the technology side, we're figuring out what does explainability mean for operating models. How do we take advantage of some of the explainability techniques that are coming from academia, from research to help on the operations side. Then fairness is another, fairness and bias, is another big space that is still a massive area of research.
Alessya Visnjic: And I would say not inclusiveness, but there's some metric then that we agree upon, some metrics that we don't agree upon in the fairness and bias space. So we're still figuring out what does that mean and how does fairness and bias kind of manifest itself in the MLOps tooling and MLOps process. And then, of course, causality. So majority of machine learning systems are not causal. What does that mean for operations?
Alessya Visnjic: Because when we think about software operations, we always think about cause and effect. With machine learning systems being not causal like the root cause analysis becomes a little tricky and so on. So explainability, fairness, and causality, I think are still creating some challenges on the technology side of what do these tools look like and how do we ensure that these tools can service right when it comes to understanding the behavior of machine learning models.
Standardization in the Tooling Stack
Adel Nehme: That's really insightful. Thanks for that. So you mentioned here the tooling stack. On the analysis side, it seems that the data science toolkit has standardized around open source programming, languages, and tools like Python, Pandas, Scikit-learn, and so on and so forth. When do you think we'll reach a standardization in the tooling stack and the MLOps space?
Alessya Visnjic: I'm not sure if there is an end period to a canonical stack. I think we are starting to move towards kind of groups and organizations that are beginning to formulate their thoughts about what a canonical stack would look like. One of such organizations, which I am personally part of and I'm a big fan of, is an organization called AI Infrastructure Alliance are short for II. And that is the group of startups actually that are building an infrastructure alliance or building the canonical stack for machine learning applications.
Alessya Visnjic: So, not a canonical stack necessarily for just MLOps, but a canonical stack for building, developing, deploying, and operating machine learning applications. So that's a fantastic move and I'm really excited to see what the stack is going to be like, and not a spoiler alert because you can go to the AI Infrastructure Alliance website and kind of see how the alliance is approaching the problem.
Alessya Visnjic: I would say, it's not going to be a prescriptive recipe of you take these 10 tools and you add them into your process and off you go, it's very use case specific; it's specific to where the organization is at in their journey of adopting machine learning and AI; it's specific to what scale are you running things out and so on. But I think we're seeing some positive signs that there is an emergence of a set of tools or I would even just say set of practices that are widely accepted as necessary practices to have in your organization in order to implement machine learning operations well.
When should organizations get started with their MLOps journey?
Adel Nehme: That's exciting. So if you had a blueprint to provide organizations on how and when to get started, when do you think organizations should get started with their MLOps journey? And if you had a blueprint, what would that blueprint look like?
Alessya Visnjic: As early as possible. At the same time as they start adopting and building machine learning organizations and adopting machine learning technology. So I would start with defining a few themes and again, going back to DevOps because there's really no reason to reinvent those wheels. I would say the important themes to think about, as I mentioned, reproducibility, transparency, robustness, I would also add the culture of quality and ownership to that.
Alessya Visnjic: So those are kind of the five things to keep in mind when you are starting to develop machine learning applications. Those are the questions that the organization should start asking themselves as quickly as possible. And then from there, I would say a great process, lightweight process for an organization to figure out their own blueprint, is to start with their DevOps team to figure out what are all of the activities, processes, mechanisms that this organization already implements for traditional software, and then extend their thinking to include data and to include machine learning non-deterministic probabilistic applications.
Alessya Visnjic: So that means that you should be doing everything that you're doing for traditional software for the DevOps side of the traditional software. And then you should have processes and tools that are specific to understanding the data processes and tools that are specific to understanding the model. And that's not a blueprint per se, but if you go through this process that would help you develop the very specialized blueprint to your organization and to your specific use case.
Adel Nehme: Now, circling back to the people element of this, how important do you view the role of data, culture and data literacy when fostering an environment to scaling envelopes but also of solving the core issues with deploying machine learning models at scale?
Alessya Visnjic: I think that data culture and data literacy are absolutely key to creating the right environment, for creating the environment that is ready for let's say responsible wishing learning adoption. And the reason I say that is because it's easy, when we say machine learning, it's easy to just think and focus on the algorithms and kind of take data for granted. And that's very dangerous because, as we know and as we're seeing kind of machine learning failures that make the news, majority of unfortunate experiences that stem from machine learning come from poorly collected data for training, poorly instrumented machine learning predictions that kind of allow unfortunate experiences to bubble all the way up to the customer.
Alessya Visnjic: To be more specific. I would say data hides biases, data hides bugs, data hides wishing learning failures. And if we are not thinking about the data, how this data was collected, who brought that data in, who is the data provider, who cleaned that data, do we have an understanding of what segment of our customers does this data describe, does this data describe all of our customers holistically or not, does this data have personally identifiable information or not, and so on.
Alessya Visnjic: Without having that culture and understand they know how valuable, but also dangerous the data that you're handling could be if you handle it incorrectly, I think machine learning adoption wouldn't be successful in a company. And to that note, I'm really excited to hear how Andrew Ang is approaching this whole concept of data culture. He's really outspoken on the subject. A lot of his talks focus on encouraging data scientists to think about their data first and foremost and models being kind of a secondary vehicle for working with the data.
Alessya Visnjic: And then a good friend of mine, Joe Reis from Ternary Data runs a podcast as well and is great, well outspoken individual who comes from data engineering side and kind of merging his experience with the machine learning culture brings on a lot of interesting insights that capture how culture or lack of culture around data can affect wishing learning adoption. So I would recommend listeners to go and look up, well I'm sure Andrew Ang is a really well-known, so looking up some of his latest talks has a lot of interesting information and insight on data culture, and then Joe Reis from Ternary Data has a lot of interesting content on data culture and data literacy,
Adel Nehme: 100%. And we'll definitely include those in the show notes. Now, I want to pivot to talk about WhyLabs and how it fits into the MLOps tooling stack, but before, how would you describe the current tooling space in MLOps?
Alessya Visnjic: Well, I would say the tooling space is emerging rapidly. So I can see two categories of tools. One category is cloud providers who offer kind of machine learning building blocks. So everything from data streaming to data warehousing to scaling out the Jupiter notebooks because that's still something that scientists use fairly often to build models and so on.
Alessya Visnjic: So cloud providers are starting to also add tools that help you with MLOps. So tools for testing, versioning, monitoring your machine learning applications and data pipelines. So that's one category. And the second category is a bunch of startups that are emerging with purpose built tools for operations of machine learning applications. So no matter where you are in your journey as an organization, I guess there are two big options to look at: both the cloud providers and the startups. And there's definitely no shortage of options to try and to consider when you are kind of seeding the tooling of your organization for MLOps.
Adel Nehme: Can you walk us through WhyLabs and its mission and how it solves some of the pain points of deploying ML models at scale?
Alessya Visnjic: Sure. So WhyLabs is an AI observability company, and our mission is to build an interface between AI applications and human operators. So what we help AI builders with is to ensure that their models are making an impact that they were built to make. And what I mean by that specifically is we have an AI observability platform that helps ensure that the models do not fail unexpectedly, that helps ensure that the models are fair to the maximum extent we can calculate that, to help models realize ROI, and to ensure continuous and automated delivery.
Adel Nehme: That's great. Do you mind walking us through some of the specific features that data teams can leverage today?
Alessya Visnjic: Absolutely. And actually what I would do is I would focus on the open source library that we vent that has been really popular in the community. The library's called whylogs and what whylogs is, whylogs is a standard for data logging which is very foundational to the entire MLOps practice. So in traditional software we capture a lot of information about how the software is running in the logs.
Alessya Visnjic: When it comes to machine learning, you can capture a lot of information in the traditional logs. However, when it comes to data that flows through the pipelines and gets transformed along each step, there is traditionally no way of capturing what does that data look like and how does it change from step to step. So whylogs helps you do precisely that.
Alessya Visnjic: It's a purpose-built machine learning logging library that is open sourced by our team at WhyLabs, and it's built to provide very lightweight, portable, and configurable way of logging statistical properties of data and model predictions in both batch and streaming data workloads. So what whylogs enables anybody to do, since it's an open source library, is to capture statistical properties of data and then build testing, monitoring, and alerting on top of that to identify things like data drift, data bias, data outliers, and then identify degradations in model performance.
Call to Action
Adel Nehme: That is super exciting. Finally, Alessya, is there any final call to action you'd like to make before we wrap up?
Alessya Visnjic: Yes. So first I'm really excited to be part of the MLOps conversation, and I think the space is rapidly emerging. So you listener, if your organization is thinking about going and establishing an MLOps culture, would love to talk to you. And every AI practitioner who is tuning in as we at WhyLabs are building the whylogs standard for data logging would love your feedback and your contributions.
Alessya Visnjic: It takes a village and it takes everybody to define the standard for something so canonical as data logging. So I will be sharing the link to get hub and would love feedback, contributions, issues, feature requests, get help stars and so on. Join the movement that we started for building the standard for data logging and MLOps.
Adel Nehme: That's awesome. Alessya, thank you so much for coming on today's podcast.
Alessya Visnjic: Thank you, Adel. Really great to be here.
Adel Nehme: That's it for today's episode of DataFramed. Thanks for being with us. I really enjoyed Alessya's insights on MLOps and how data science is evolving to meet the needs of organizations. If you enjoyed today's episode, make sure to leave a review on iTunes. Our next episode will be with Maria Luciana Axente, head of responsible AI at PwC. I hope it will be useful for you, and we hope to catch you next time on DataFramed.