Skip to main content

How Smooth Are Your Data Team's Operations? Discover Your Organization's MLOps Maturity Level

This article provides a primer on MLOps maturity models for data teams to follow and consider when evolving their machine learning capabilities.
Jun 2022  · 12 min read

MLOps, or Machine Learning Operations, is an essential aspect of the machine learning workflow and is solidifying into a field of itself. It combines machine learning with software engineer best practices to ensure models are able to make it through the pipeline from experimentation into production while also reducing technical debt in data teams.

To be successful with machine learning at scale, it is essential to consider your organization’s MLOps maturity level. Often, there are barriers to deploying models, such as infrastructure, organizational culture, or an inability to retrain the model when it drifts. This can lead to much less value derived from machine learning than expected (or worse, potentially harming stakeholders interacting with your models). This can lead to mistrust in the data team’s ability to provide value with machine learning. To fully realize the potential of machine learning and AI, a well-thought-out approach to MLOps is required.

MLOps also allows data scientists to focus on what really matters: collecting and cleaning data, developing models, and utilizing the right techniques. This is because MLOps significantly increases automation at every stage of the machine learning lifecycle.

This article assumes you have a working knowledge of MLOps and understand its importance; for more information about MLOps, take a look at our articles Getting Started with MLOps and MLOps Best Practices

Why Have Maturity Levels?

MLOps isn’t just about adopting tools or changing ways of working. Implementation requires a holistic approach to using tools and technology to break down silos within technical teams and increase automation. This may mean restructuring an organization or moving people around, but every company's solution will be different and unique. 

Considering MLOps through the lens of maturity, or a measurement of how far along your company is in the MLOps implementation process helps to establish where an organization is and what it can do to increase maturity, as well as what this increase in maturity will provide. It also means MLOps maturity can scale with an organization. Small companies with few models will not need to be at the same maturity level as large, complex organizations. However, even small organizations whose main value proposition requires machine learning will need a relatively mature MLOps practice.

It is also worth noting that moving up the maturity model is not linear. Your organization may have elements of an MLOps practice with high maturity, but low maturity in other areas. In the section below, we’ll outline Microsoft’s MLOps maturity model, which provides a holistic overview of how MLOps evolves within an organization. 

Microsoft’s MLOps Maturity Model

The Microsoft MLOps maturity model defines three broad dimensions by which to evaluate MLOps maturity:

  • People: The different roles within a data team, and how they interact with each other
  • Machine Learning Lifecycle: How the machine learning lifecycle is managed, from data collection to model creation and release.
  • Application: How machine learning models are tested, implemented, deployed, and retrained.

Moreover, the Microsoft MLOps maturity model defines five maturity stages for an MLOps practice, going as follows:

  1. No MLOps: Disparate, black-box systems with siloed data teams and manual training, deployment, and testing of models.
  2. DevOps but no MLOps: Data team trains models while a separate team deploys them. The feedback loop on model performance is opaque with limited reproducibility. 
  3. Automated Training: Models are reproducible and releases are less manual. Training is automatic and pipelines are widely used.
  4. Automated Deployment: Deployments are automated and can be traced back to the original data. Models can be A/B tested after deployment and tests are automated.
  5. Full MLOps: The system is fully automated from data ingestion to model deployment and testing, with monitored, centralized analytics on model performance. These systems are often built bespoke.

This section below will cover each maturity level, with a summary table covering the key points.

No MLOps

At this stage, there is very little automation and the data team is siloed from the wider organization and within itself. Adoption of tools is low and some functions such as data engineering may not exist or may be carried out by data scientists. At this maturity level, deploying even a single model is difficult and time-consuming and retraining models requires completely rerunning analysis and training jobs. There is no tracking of the model after deployment so exploring its effect may not be possible.

People

Data Scientists

Not in communication with the wider team, working independently.

Data Engineers

May not exist.

Software Engineers

Receive models from data scientists and are siloed from the data team.

ML Lifecycle

Data Preparation

While databases may exist, data is gathered manually for model training. 

Model Training

Experiments are not tracked, and there are no data and machine learning pipelines set in place. 

Model Deployment

Models are usually delivered manually with inputs and outputs. There is no version control and the scoring script is manually created, deployment is usually handled by data scientists.

Application

Integration

Fully manual testing and release each time a model is ready for deployment, heavily reliant on data scientists. 

DevOps but no MLOps

With the integration of DevOps best practices, the data team is still siloed, although there may be dedicated data engineers. Cloud computing technologies may be adopted, but are not used to their full potential. At this stage, collecting data from databases and preparing for machine learning will be automated in reusable pipelines. There may also be integration tests when models are deployed. Data scientists will be wearing many hats and are heavily involved in testing models after deployment.

People

Data Scientists

Not in communication with the wider team, working independently.

Data Engineers

Not in communication with the wider team, working independently.

Software Engineers

Receive models from data scientists and are siloed from the data team.

ML Lifecycle

Data Prep

Automated data pipelines are available and may run on managed cloud resources.

Model Training

Experiments are not reproducible and are not all tracked predictably. 

Model Deployment

Models are usually delivered manually with inputs and outputs. There is version control and the scoring script is still manually created, deployment is usually handled by data scientists or engineers.

Application

Integration

Releases are automated and basic integration tests are in place but are heavily dependent on the expertise of data scientists. 

Automated Training

At this stage, collaboration within the data team increases significantly. Data scientists will now work with engineers to turn their model training code into repeatable scripts which utilize automated data pipelines. Experiments will be tracked and version control will be much more widespread. Deployment is more automated with model files still being passed over to software engineers.

People

Data Scientists

Data scientists work with data engineers to turn experiment code into repeatable scripts.

Data Engineers

Software Engineers

Receive models from data scientists and are siloed from the data team.

ML Lifecycle

Data Prep

Automated data pipelines that run on managed cloud resources.

Model Training

Experiments are tracked and training code and models are version controlled.

Model Deployment

Models are still deployed manually, however, the scoring script is version controlled and the release is now managed by software engineering teams. 

Application

Integration

Releases are automated and basic integration tests are in place but are heavily dependent on the expertise of data scientists. 

Automated Model Deployment

Software engineers are now collaborating much more closely with data engineers to deploy the models, as well as their training and data collection pipelines. Experiments are fully tracked and there are automated unit and integration tests for each model release and the application itself. There may also be CI/CD in some areas.

People

Data Scientists

Data scientists work with data engineers to turn experiment code into repeatable scripts.

Data Engineers

Software Engineers

Working with data engineers to automate model integration into applications. 

ML Lifecycle

Data Prep

Automated data pipelines that run on managed cloud resources.

Model Training

Experiments are tracked and training code and models are version controlled.

Model Deployment

Models are still deployed automatically, the scoring script is version controlled and the release is now managed by continuous delivery (CI/CD) pipeline

Application

Integration

Integration in application code is less reliant on data scientists, and unit and integration tests are in place for each model release

Full MLOps Automated Retraining

In the final maturity stage, data engineers, data scientists, and software engineers are working hand in hand to ensure as much automation as possible in the development and deployment process. Moreover, experiments and model training and retraining are automated based on production metrics, models are released automatically and managed by a  CI/CD pipeline, and models are tested in application code. 

People

Data Scientists

Data scientists work with data engineers to covert experiments into repeatable scripts and work with software engineers to automate processes.


Data engineers and software engineers work together to automate model integration into an application and collect post-deployment performance metrics. 

Data Engineers

Software Engineers

ML Lifecycle

Data Prep

Automated data pipelines that run on managed cloud resources.

Model Training

Experiments are tracked and training code and models are version controlled, models are retrained automatically after deployment based on performance metrics. 

Model Deployment

Models are still deployed automatically, the scoring script is version controlled and the release is now managed by continuous delivery (CI/CD) pipeline

Application

Integration

Integration in application code is less reliant on data scientists, and unit and integration tests are in place for each model release

A notable difference between this MLOps maturity model and Google’s is the implementation of CI/CD concepts. Google suggests these are typically at the final level of maturity, while Microsoft suggests retraining is the final stage.

In reality, retraining is one of the key drivers for adopting MLOps and may be integrated earlier than the final stage. However, there are prerequisites for model retraining, such as strong automation and infrastructure, as well as high-quality data and model training pipelines. CI/CD concepts will continue to be improved into the final stages of MLOps maturity. 

Highly mature organizations such as Google and Microsoft will typically have bespoke software that runs on their own cloud computing solutions and utilizes specific tools to their highest potential. Software engineers often state that technology at these companies ‘just works’, with frictionless model deployments and efficient access to data. This is required because these companies have hundreds of models in production for the wide range of products they offer.

How to Grow Your MLOps Maturity

MLOps is still one of the most critical areas of data science, as many organizations still struggle with getting models into production. MLOps maturity models provide an excellent framework for understanding where data teams currently sit in their machine learning abilities and how they can evolve moving forward. 

To learn more about MLOPs, take a look at these resources:



Machine Learning with scikit-learn

Beginner
4 hours
313,043
Learn how to build and tune predictive models and evaluate how well they'll perform on unseen data.
See DetailsRight Arrow
Start Course

Unsupervised Learning in Python

Beginner
4 hours
107,381
Learn how to cluster, transform, visualize, and extract insights from unlabeled datasets using scikit-learn and scipy.

Advanced Deep Learning with Keras

Beginner
4 hours
25,028
Build multiple-input and multiple-output deep learning models using Keras.
See MoreRight Arrow
← Back to Blogs