Home PythonDesigning Machine Learning Workflows in Python

Designing Machine Learning Workflows in Python

Learn to build pipelines that stand the test of time.

Start Course for Free

4 Hours16 Videos51 Exercises

10,123 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

Deploying machine learning models in production seems easy with modern tools, but often ends in disappointment as the model performs worse in production than in development. This course will give you four superpowers that will make you stand out from the data science crowd and build pipelines that stand the test of time: how to exhaustively tune every aspect of your model in development; how to make the best possible use of available domain expertise; how to monitor your model in performance and deal with any performance deterioration; and finally how to deal with poorly or scarcely labelled data. Digging deep into the cutting edge of sklearn, and dealing with real-life datasets from hot areas like personalized healthcare and cybersecurity, this course reveals a view of machine learning from the frontline.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
The Standard Workflow
Free
In this chapter, you will be reminded of the basics of a supervised learning workflow, complete with model fitting, tuning and selection, feature engineering and selection, and data splitting techniques. You will understand how these steps in a workflow depend on each other, and recognize how they can all contribute to, or fight against overfitting: the data scientist's worst enemy. By the end of the chapter, you will already be fluent in supervised learning, and ready to take the dive towards more advanced material in later chapters.
Play Chapter Now
Supervised learning pipelines
50 xp
Feature engineering
100 xp
Your first pipeline
100 xp
Model complexity and overfitting
50 xp
Grid search CV for model complexity
100 xp
Number of trees and estimators
50 xp
Feature engineering and overfitting
50 xp
Categorical encodings
100 xp
Feature transformations
100 xp
Bringing it all together
100 xp
2
The Human in the Loop
In the previous chapter, you perfected your knowledge of the standard supervised learning workflows. In this chapter, you will critically examine the ways in which expert knowledge is incorporated in supervised learning. This is done through the identification of the appropriate unit of analysis which might require feature engineering across multiple data sources, through the sometimes imperfect process of labeling examples, and through the specification of a loss function that captures the true business value of errors made by your machine learning model.
Play Chapter Now
Data fusion
50 xp
Is the source or the destination bad?
100 xp
Feature engineering on grouped data
100 xp
Imperfect labels
50 xp
Turning a heuristic into a classifier
100 xp
Combining heuristics
100 xp
Dealing with label noise
100 xp
Loss functions Part I
50 xp
Reminder of performance metrics
100 xp
Real-world cost analysis
100 xp
Confusion matrix calculations
50 xp
Loss functions Part II
50 xp
Default thresholding
100 xp
Optimizing the threshold
100 xp
Bringing it all together
100 xp
3
Model Lifecycle Management
In the previous chapter, you employed different ways of incorporating feedback from experts in your workflow, and evaluating it in ways that are aligned with business value. Now it is time for you to practice the skills needed to productize your model and ensure it continues to perform well thereafter by iteratively improving it. You will also learn to diagnose dataset shift and mitigate the effect that a changing environment can have on your model's accuracy.
Play Chapter Now
From workflows to pipelines
50 xp
Your first pipeline - again!
100 xp
Custom scorers in pipelines
100 xp
Model deployment
50 xp
Pickles
100 xp
Custom function transformers in pipelines
100 xp
Iterating without overfitting
50 xp
Challenge the champion
100 xp
Cross-validation statistics
100 xp
Dataset shift
50 xp
Tuning the window size
100 xp
Bringing it all together
100 xp
4
Unsupervised Workflows
In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.
Play Chapter Now
Anomaly detection
50 xp
A simple outlier
100 xp
LoF contamination
100 xp
Novelty detection
50 xp
A simple novelty
100 xp
Three novelty detectors
100 xp
Contamination revisited
100 xp
Distance-based learning
50 xp
Find the neighbor
100 xp
Not all metrics agree
100 xp
Unstructured data
50 xp
Restricted Levenshtein
100 xp
Bringing it all together
100 xp
Concluding remarks
50 xp

Datasets

Credit Flows Attacks Hepatitis Proteins Arrhythmia

Collaborators

Chester Ismay

Sara Billen

Prerequisites

Python Data Science Toolbox (Part 2)Unsupervised Learning in Python Supervised Learning with scikit-learn

Christoforos Anagnostopoulos

Honorary Associate Professor

My career has been motivated by a singular and genuine curiosity into what it means to learn from evidence, and in particular from evidence arising from measurements, i.e., data. I have been fortunate enough to both study but also teach at a number of great institutions, including the University of Cambridge where I was a Fellow, and Imperial College London where I still retain an honorary Associate Professorship. I have applied data science in a number of fields ranging from cybersecurity to neuroimaging, both as an academic but also as a startup founder. In a constant search of the most innovative environment I could find, I am now at Improbable, a London-based company building virtual worlds of unprecedented scale for games and real-world planning. I love programming and am the author of a Python project with over 600 GitHub stars and an R package of with many thousands of downloads.

What do other learners have to say?

Join over 13 million learners and start Designing Machine Learning Workflows in Python today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

The Standard Workflow

The Human in the Loop

Model Lifecycle Management

Unsupervised Workflows

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Designing Machine Learning Workflows in Python today!

Create Your Free Account

Training 2 or more people?

Join over 13 million learners and start Designing Machine Learning Workflows in Python today!