Nowadays, too many machine learning models are not deployed in production and remain in the data labs. Just like any emerging field, the set of best practices, tools, techniques, and roles modern data teams need to adopt have yet to mature and standardize. This creates grievances for the businesses looking to extract value from machine learning at scale, and the data scientists looking to move beyond experimenting in a notebook. MLOps (Machine Learning Operations) has emerged over the past few years with the aim of solving the deployment challenges data teams face.
The clue is in the name: just as DevOps shaped rapid Agile application development around business needs, MLOps aims to do the same by bridging the gap between complexity and the deployment of machine learning models.
How exactly does MLOps address these challenges? In this article, we aim to demystify some of the concepts around MLOps as an emerging discipline and answer key questions such as
- What is MLOps and what are the key components of a successful MLOps practice?
- How is MLOps different from DevOps?
- How does MLOps reimagine the machine learning workflow?
- How can you get started with MLOps today?
What is MLOps?
Alessya Visnjic, CEO of WhyLabs, an MLOps startup focused on model monitoring, aptly described MLOps as a “set of tools, practices, techniques, culture, and mindset that ensure reliable and scalable deployment of machine learning systems”.
Putting this definition in perspective, MLOps builds on the existing discipline of DevOps, the modern practice of efficiently writing, deploying, and running enterprise applications. It is a cross-functional, collaborative, and iterative process that operationalizes data science. MLOps treats machine learning as an engineering discipline, where models are treated as reusable software artifacts which can then be deployed via a repeatable process.
MLOps also involves continuous monitoring and retraining of models in production to ensure that they perform optimally as data changes over time, a phenomenon also known as data drift.
In a nutshell, MLOps allows data teams to scale the value they provide by allowing:
- Faster deployment of more models through automated processes
- Optimize productivity through collaboration and model reuse
- Reduce risk and cost on models that never make it to production
- Continuously monitor and update models as data drift occurs
Connecting back to Visnjic’s definition of MLOps, the above requires both tooling innovation for modern data teams, as well as a shift in mindset for many data scientists today.
Why is MLOps important?
We’ve already covered how MLOps aims to solve the deployment challenges for many data teams today. However, what makes machine learning so unique that it requires new specialized operations functions like MLOps?
Machine Learning applications are fundamentally different from traditional software. The following are some of the significant challenges that organizations face when deploying machine learning systems into productions.
- Ownership and collaboration: Traditionally, data scientists tend to create and develop machine learning models, and IT teams take on the task of deploying and managing them. However, the collaboration between these two teams is not as efficient as it can be and tends to generate friction. For example, data science teams are increasingly solicited by different parts of the business to solve a growing number of problems. Given the complex nature of modern IT systems, data scientists pay little attention to the production environment and IT systems in place—because that's the responsibility of the IT team. Similarly, IT teams tend to not focus on the inner workings of the solutions data scientists produce—which leads to false expectations around the deployability of many of these models. This dynamic leads to friction and to anti-patterns that need to be avoided at all costs— where data teams settle for the thought process of "let's build a model, send it to IT, and they'll take over".
- Data is a defining aspect of machine learning-powered software: Data is the lifeblood of machine learning systems. Unlike traditional software, where software engineers design a well-crafted process that takes a set of inputs and delivers a set of outputs, machine learning systems rely on statistical methods that take messy real-world data as inputs and deliver predictions as outputs. This means that the behavior of machine learning systems is subject to change due to changes in data. Moreover, this means that evaluating the performance of machine learning models requires observation and analysis. Finally, this means that the machine learning workflow is messy, experimental in nature, and naturally lends itself to a different type of skill set than traditional software engineering.
- Deployment complexity: There is an increasing complexity to machine learning models systems. As opposed to traditional software, deploying machine learning models includes orchestrating a variety of interconnected steps from disparate tools. This includes data collection, storage, transformation, feature engineering, and more. Moreover, reproducibility and version control represent major challenges for data teams. Given the experimental nature of machine learning, data science teams build many versions of a model using different versions of the same dataset. Thus, traditional version control needs to become more robust.
MLOps vs DevOps—what is the difference?
In the previous section, we broke down why machine learning needs a specialized operations function. However, how does MLOps differ from DevOps in practice?
DevOps is a contraction of Development (Dev) and Operations (Ops). It combines two essential functions of the IT department: application development and systems engineering. DevOps tries to shorten development cycles—and accelerate the velocity of output for software engineering teams. It does so by introducing automation, updated processes, and working methods for development teams. More broadly, DevOps brings in two principles to the software development process:
- Continuous Integration (CI)—the process of making frequent, small iterations to version control repositories. This alleviates dealing with deployment issues as code is deployed into production frequently.
- Continuous Delivery (CD)—the process of automating the steps needed to deliver applications and software to production environments.
Given the unique nature of machine learning, here are some practical ways MLOps differs from DevOps:
- Continuous Integration is extended from testing and validating code, to also testing and validating models and data
- Continuous Delivery is extended from automating steps to delivering applications into production to automatically delivering data pipelines that trigger a machine learning prediction
- Continous Training is introduced—which is unique to machine learning—where models are automatically retrained for deployment
- Continuous Monitoring is introduced—which monitors production data breaks in quality, model performance, and business metrics tied to the machine learning model
The MLOps Workflow
Reimagining the data science workflow for MLOps
Given the additional complexity of deploying machine learning models into production—how can data teams start adopting MLOps into their data science workflows? In this section, we introduce a simplified step-by-step approach in an MLOps process:
- Building: Once models are created, they are typically placed in an auditable repository under version control to support reuse across the enterprise.
- Evaluation: The model predictions' quality is quantified at this stage by measuring the newly trained model performance on a new and independent dataset.
- Productionizing: Export, deploy, and integrate the model or pipeline into production systems and applications.
- Testing: continuous testing is important for ML-based applications, which is concerned with automatically retraining and serving the models.
- Deployment: Continuous monitoring is required to ensure optimal performance. The model can be retrained or replaced with a new model as data changes.
- Monitoring and observability: Many companies face challenges when it comes to moving machine learning models into production environments.
Reimagining data roles for MLOps
In smaller data science teams, one person can have more than one role and wear many hats. However, in bigger, more processed data teams, you can find a variety of roles and skills that own different elements of the MLOps workflow. These roles can be seen as follows:
- Data Scientist: Often seen as the major player in any MLOps team, data scientists are experts who use the company’s data to generate value. Their role is to understand, structure, and interpret this data to bring and provide insights from this data in the form of predictive models. They create, test, and evaluate machine learning models. In some companies, also deploy and monitor the models' performance once it’s in production.
- Data Engineer: Data Engineers are responsible for creating and maintaining the environment that allows almost every other function in the data team to operate. They are responsible for developing, building, maintaining, and testing architectures, such as databases and processing systems. In simple terms, they enable the flow of data from extraction, to transformation, to delivery.
- Software Engineer: In an MLOps process, software engineers are in charge of integrating machine learning models into the company’s applications and systems. They also work on ensuring that machine learning models work effortlessly with any non-machine learning-based applications within the company.
- Machine Learning Engineer: The machine learning engineer is at the crossroads of data science and data engineering. The role of the machine learning engineer is to optimize and put into production the models developed by the data scientist within the infrastructure prepared by the data engineer.
Getting started with MLOps
As discussed throughout this article, MLOps is still a nascent field with many of the tools, best practices, and methodologies still emerging today. This section is dedicated to different ways you can get started with MLOps, with tools you can start experimenting with, and learning resources you can take today.
Tools to consider for MLOps
Kubeflow: Kubeflow is a suite of tools for running Machine Learning workflows on Kubernetes clusters. The goal of Kubeflow is to enable the best open-source machine learning solutions to run on a Kubernetes cluster in a simple, portable, and scalable way. Originally Kubeflow was the open-source implementation of TensorFlow Extended (TFX), which is an end-to-end platform for deploying machine learning pipelines in production. Kubeflow thus allowed to simplify the execution of TensorFlow jobs on KubernetesMLFlow: MLflow is a tool for industrializing the end-to-end development process of Machine Learning projects. Its ambition is to simplify the development of Machine Learning projects in companies by facilitating models' monitoring, reproduction, management, and deployment.
Data Version Control (DVC): DVC (Data Version Control) is a Python package that makes managing your data science projects easier. This tool is an extension of Git for Machine Learning, as stated by its main contributor Dmitry Petrov in this presentation. DVC is both comparable and complementary to Git.
Pachyderm: Pachyderm is a version control tool for machine learning and data science like DVC. On top of that, it is based on Docker and Kubernetes, which helps it run and deploy Machine Learning projects on any cloud platform. In addition, pachyderm ensures that all data ingested into a machine learning model is versioned and traceable.
Learning resources for MLOps
Machine Learning Fundamentals
- Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) by Oliver Theobald. As the title indicates, this book offers beginners a complete introduction to machine learning. And when we talk about beginners, we mean true novices. There is no need for any basic knowledge of mathematics or any coding experience. This is a basic introduction to machine learning for anyone interested in this topic. The language used is quite simple so as not to drown the readers in incomprehensible jargon. The different algorithms are accompanied by clear and easy-to-follow explanations and visual examples. This book also presents some simple programming notions to contextualize machine learning better.
- Machine Learning For Dummies by John Paul Mueller and Luca Massaron. For novices, the "for dummies" series of books is also a good starting point. This book introduces the basic concepts and theories of machine learning and explains how to apply them to the real world. It introduces the essential programming languages and tools and explains how to turn a relatively esoteric concept into a practical tool. He discusses the programming languages Python and R, which are used to teach machines to spot patterns and analyze results.
- Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies by John D. Kelleher, Brian Mac Namee, and Aoife D'Arcy. This book covers all the fundamental notions of machine learning, presenting both the theoretical aspect and the practical applications. It offers concrete examples and case studies to better convey the knowledge. It is recommended to have a basic knowledge of analytics to better understand these fundamental notions. This book presents the different approaches to machine learning and illustrates each learning concept with algorithms and models, as well as concrete examples to put these concepts into practice.
- Machine Learning for Hackers by Drew Conway and John Myles White. The term "hackers" here refers to programmers who create code for specific purposes and practical projects. This book is intended for readers who do not have a background in mathematics but know how to code and use programming languages. Machine learning typically relies on mathematical concepts since it uses algorithms to analyze data, but many experienced coders typically do not have highly developed mathematical skills. Instead of dwelling on mathematical theories, this book presents real-world applications based on practical studies. It addresses classic machine learning problems and explains how to solve them using the R programming language. Whether it's comparing senators based on the votes they've received, creating a recommendation system for people to follow on Twitter, or spotting spam based on its content, the possibilities of machine learning are endless.
- DataCamp’s Machine Learning Scientist with Python or R Data Scientist Track. Whether you’re an R or Python user, these two tracks cover the ins and outs of machine learning. Each track contains dozens of interactive courses covering the basics of machine learning, to more advanced topics like deep learning and feature engineering.
Data Engineering Fundamentals
- Andreas Kretz's Data Engineering Cookbook. There is a lot of confusion about how to become a data engineer. This is an eBook by Andreas Kertz that contains elaborate case studies, code, podcasts, interviews, case studies, and more. I consider it a complete package for anyone to become a data engineer. And the icing on the cake? This ebook is free!!! Yes, you can start using it instantly.
- DW 2.0 - The Architecture for the Next Generation of Data Warehousing by the Father of Data Warehousing WH Inmon. This book describes the future of data warehousing that is technologically possible today, both architecturally and technologically. It is carefully structured and covers most of the topics related to data architecture and its underlying challenges. How you can use the existing system and build a data warehouse around it, and best practices for justifying the expense in a very practical way.
- Agile data warehouse design: collaborative dimensional modeling, from blackboard to star schema by Laurent Corr. This is a great book. Lawrence Corr provides a comprehensive step-by-step guide to capturing business intelligence and data warehousing requirements and turning them into high-performance models using a technique called model storming. In addition, you will find a concept called BEAM, an agile approach to dimensional modeling to improve communication between data warehouse designers and business intelligence stakeholders.
- DataCamp’s Data Engineering with Python Career Track. This track provides dozens of courses covering the ins and outs of building effective data architectures, streamlining data ingestion, building pipelines, and more.
Dive into MLOps
- MLOps Fundamentals Skill Track. This skill track covers the complete life-cycle of a machine learning application, ranging from the gathering of business requirements to the design, development, deployment, operation, and maintenance stages
- MLOps: Operationalizing Data Science, by David Sweenor, Dev Kannabiran, Thomas Hill, Steven Hillion, Dan Rope, and Michael O'Connell. These 6 experts in data analytics provide a four-step approach to creating machine-learning-based applications that make it into production.
- Building Machine Learning Powered Applications by Emmanuel Ameisen. In this book, author Emmanuel Ameisen will teach you how to build a machine learning-driven application from initial idea to deployed product.
- Building Machine Learning Pipelines by Hannes Hapke, Catherine Nelson. Throughout this book, the authors Hannes Hapke and Catherine Nelson showcase the steps of automating a machine learning pipeline using the TensorFlow ecosystem.
- Practical MLOps by Noah Gift, Alfredo Deza. Highlighting the difference between DevOps and MLops, this book will draw your attention to what is MLOps about and how it ensures operationalizing your machine learning models. This book presents tools and methods to enable you to implement MLOps projects in AWS, Microsoft Azure, and Google Cloud. Also, make sure to catch Noah’s live training on Practical MLOps on DataCamp.
- Introducing MLOps by Mark Treveil & Dataiku Team. The authors of this book enable a deep understanding of the key concepts of MLOps to ensure that data science teams are able to operationalize Machine Learning models to enhance business change and improve models over time.
- Google Cloud provides articles, blogs, and papers that walk you through the best practices and processes to use to build efficient machine learning models. In the selected article, you will learn MLOps processes and how to shift from manual to automated processes.
- Nvidia’s blog features articles that walk you through the MLOps Lifecycle and showcase some of the success stories within the field.
- Ml-ops.org was created by Dr. Larysa Visengeriyeva, Anja Kammer, Isabel Bär, Alexander Kniesz, and Michael Plöd. This website aims to gather all the necessary information on MLOps and showcase each step of the end-to-end process.
We hope this set of resources will get you started on your MLOps learning journey. For more articles and resources on MLOps, you can also check out the following articles:
What is Named Entity Recognition (NER)? Methods, Use Cases, and Challenges
The Curse of Dimensionality in Machine Learning: Challenges, Impacts, and Solutions
Machine Learning Engineer Salaries in 2023
What is Continuous Learning? Revolutionizing Machine Learning & Adaptability