Ir al contenido principal

Curso

Diseño de flujos de trabajo de Machine Learning en Python

AvanzadoNivel de habilidad

Actualizado 11/2024

Aprende a crear procesos que resistan el paso del tiempo.

Comienza el curso gratis

PythonMachine Learning

4 h

16 vídeos

51 Ejercicios

4,200 XP

12,574

Certificado de logros

Preferido por estudiantes en miles de empresas

¿Formando un equipo?

Prueba para empresas

Descripción del curso

Poner modelos de machine learning en producción parece fácil con las herramientas modernas, pero a menudo acaba en decepción cuando el modelo rinde peor en producción que en desarrollo. Este curso te dará cuatro superpoderes que te harán destacar entre la comunidad de ciencia de datos y crear canalizaciones que resistan el paso del tiempo: cómo ajustar a fondo cada aspecto de tu modelo en desarrollo; cómo aprovechar al máximo la experiencia del dominio disponible; cómo monitorizar el rendimiento de tu modelo y gestionar cualquier deterioro; y, por último, cómo trabajar con datos mal etiquetados o con pocas etiquetas. Profundizando en lo más avanzado de sklearn y trabajando con conjuntos de datos reales de áreas candentes como la salud personalizada y la ciberseguridad, este curso te muestra una visión del machine learning desde la primera línea.

Requisitos previos

Python Toolbox Unsupervised Learning in Python Supervised Learning with scikit-learn

1

The Standard Workflow

In this chapter, you will be reminded of the basics of a supervised learning workflow, complete with model fitting, tuning and selection, feature engineering and selection, and data splitting techniques. You will understand how these steps in a workflow depend on each other, and recognize how they can all contribute to, or fight against overfitting: the data scientist's worst enemy. By the end of the chapter, you will already be fluent in supervised learning, and ready to take the dive towards more advanced material in later chapters.

Supervised learning pipelines

Feature engineering

Your first pipeline

Model complexity and overfitting

Grid search CV for model complexity

Number of trees and estimators

Feature engineering and overfitting

Categorical encodings

Feature transformations

Bringing it all together

Iniciar capítulo

2

The Human in the Loop

In the previous chapter, you perfected your knowledge of the standard supervised learning workflows. In this chapter, you will critically examine the ways in which expert knowledge is incorporated in supervised learning. This is done through the identification of the appropriate unit of analysis which might require feature engineering across multiple data sources, through the sometimes imperfect process of labeling examples, and through the specification of a loss function that captures the true business value of errors made by your machine learning model.

Data fusion

Is the source or the destination bad?

Feature engineering on grouped data

Imperfect labels

Turning a heuristic into a classifier

Combining heuristics

Dealing with label noise

Loss functions Part I

Reminder of performance metrics

Real-world cost analysis

Confusion matrix calculations

Loss functions Part II

Default thresholding

Optimizing the threshold

Bringing it all together

Iniciar capítulo

3

Model Lifecycle Management

In the previous chapter, you employed different ways of incorporating feedback from experts in your workflow, and evaluating it in ways that are aligned with business value. Now it is time for you to practice the skills needed to productize your model and ensure it continues to perform well thereafter by iteratively improving it. You will also learn to diagnose dataset shift and mitigate the effect that a changing environment can have on your model's accuracy.

From workflows to pipelines

Your first pipeline - again!

Custom scorers in pipelines

Model deployment

Custom function transformers in pipelines

Iterating without overfitting

Challenge the champion

Cross-validation statistics

Dataset shift

Tuning the window size

Bringing it all together

Iniciar capítulo

4

Unsupervised Workflows

In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.

Anomaly detection

A simple outlier

LoF contamination

Novelty detection

A simple novelty

Three novelty detectors

Contamination revisited

Distance-based learning

Find the neighbor

Not all metrics agree

Unstructured data

Restricted Levenshtein

Bringing it all together

Concluding remarks

Iniciar capítulo

Diseño de flujos de trabajo de Machine Learning en Python

Curso
completo

Obtener certificado de logros

Añade esta certificación a tu perfil de LinkedIn o a tu currículum.
Compártelo en redes sociales y en tu evaluación de desempeño.Inscríbete ahora

¡Únete a 19 millones de estudiantes y empieza Diseño de flujos de trabajo de Machine Learning en Python hoy mismo!

Desarrolla tus habilidades de datos con la aplicación móvil de DataCamp

Progresa desde cualquier dispositivo móvil con nuestros cursos y desafíos de programación diarios de 5 minutos.