Designing Machine Learning Workflows in Python
Learn to build pipelines that stand the test of time.
Comience El Curso Gratis4 horas16 vídeos51 ejercicios
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.¿Entrenar a 2 o más personas?Pruebe DataCamp para empresas
Preferido por estudiantes en miles de empresas
Descripción del curso
Deploying machine learning models in production seems easy with modern tools, but often ends in disappointment as the model performs worse in production than in development. This course will give you four superpowers that will make you stand out from the data science crowd and build pipelines that stand the test of time: how to exhaustively tune every aspect of your model in development; how to make the best possible use of available domain expertise; how to monitor your model in performance and deal with any performance deterioration; and finally how to deal with poorly or scarcely labelled data. Digging deep into the cutting edge of sklearn, and dealing with real-life datasets from hot areas like personalized healthcare and cybersecurity, this course reveals a view of machine learning from the frontline.
Empresas
¿Entrenar a 2 o más personas?
Obtenga acceso de su equipo a la biblioteca completa de DataCamp, con informes centralizados, tareas, proyectos y más- 1
The Standard Workflow
GratuitoIn this chapter, you will be reminded of the basics of a supervised learning workflow, complete with model fitting, tuning and selection, feature engineering and selection, and data splitting techniques. You will understand how these steps in a workflow depend on each other, and recognize how they can all contribute to, or fight against overfitting: the data scientist's worst enemy. By the end of the chapter, you will already be fluent in supervised learning, and ready to take the dive towards more advanced material in later chapters.
Supervised learning pipelines50 xpFeature engineering100 xpYour first pipeline100 xpModel complexity and overfitting50 xpGrid search CV for model complexity100 xpNumber of trees and estimators50 xpFeature engineering and overfitting50 xpCategorical encodings100 xpFeature transformations100 xpBringing it all together100 xp - 2
The Human in the Loop
In the previous chapter, you perfected your knowledge of the standard supervised learning workflows. In this chapter, you will critically examine the ways in which expert knowledge is incorporated in supervised learning. This is done through the identification of the appropriate unit of analysis which might require feature engineering across multiple data sources, through the sometimes imperfect process of labeling examples, and through the specification of a loss function that captures the true business value of errors made by your machine learning model.
Data fusion50 xpIs the source or the destination bad?100 xpFeature engineering on grouped data100 xpImperfect labels50 xpTurning a heuristic into a classifier100 xpCombining heuristics100 xpDealing with label noise100 xpLoss functions Part I50 xpReminder of performance metrics100 xpReal-world cost analysis100 xpConfusion matrix calculations50 xpLoss functions Part II50 xpDefault thresholding100 xpOptimizing the threshold100 xpBringing it all together100 xp - 3
Model Lifecycle Management
In the previous chapter, you employed different ways of incorporating feedback from experts in your workflow, and evaluating it in ways that are aligned with business value. Now it is time for you to practice the skills needed to productize your model and ensure it continues to perform well thereafter by iteratively improving it. You will also learn to diagnose dataset shift and mitigate the effect that a changing environment can have on your model's accuracy.
From workflows to pipelines50 xpYour first pipeline - again!100 xpCustom scorers in pipelines100 xpModel deployment50 xpPickles100 xpCustom function transformers in pipelines100 xpIterating without overfitting50 xpChallenge the champion100 xpCross-validation statistics100 xpDataset shift50 xpTuning the window size100 xpBringing it all together100 xp - 4
Unsupervised Workflows
In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.
Anomaly detection50 xpA simple outlier100 xpLoF contamination100 xpNovelty detection50 xpA simple novelty100 xpThree novelty detectors100 xpContamination revisited100 xpDistance-based learning50 xpFind the neighbor100 xpNot all metrics agree100 xpUnstructured data50 xpRestricted Levenshtein100 xpBringing it all together100 xpConcluding remarks50 xp
Empresas
¿Entrenar a 2 o más personas?
Obtenga acceso de su equipo a la biblioteca completa de DataCamp, con informes centralizados, tareas, proyectos y máscolaboradores
Christoforos Anagnostopoulos
Ver MásHonorary Associate Professor
¿Qué tienen que decir otros alumnos?
¡Únete a 14 millones de estudiantes y empieza Designing Machine Learning Workflows in Python hoy mismo!
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.