Saltar al contenido principal
InicioPythonCleaning Data in Python

Cleaning Data in Python

Learn to diagnose and treat dirty data and develop the skills needed to transform your raw data into accurate insights!

Comience El Curso Gratis
4 Horas13 Videos44 Ejercicios
105.477 AprendicesTrophyDeclaración de cumplimiento

Crea Tu Cuenta Gratuita

GoogleLinkedInFacebook

o

Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.
Group¿Entrenar a 2 o más personas?Pruebe DataCamp para empresas

Preferido por estudiantes en miles de empresas


Descripción del curso

Discover How to Clean Data in Python

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Data cleaning is an essential step for every data scientist, as analyzing dirty data can lead to inaccurate conclusions.

In this course, you will learn how to identify, diagnose, and treat various data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

Learn How to Clean Different Data Types

The first chapter of the course explores common data problems and how you can fix them. You will first understand basic data types and how to deal with them individually. After, you'll apply range constraints and remove duplicated data points.

The last chapter explores record linkage, a powerful tool to merge multiple datasets. You'll learn how to link records by calculating the similarity between strings. Finally, you'll use your new skills to join two restaurant review datasets into one clean master dataset.

Gain Confidence in Cleaning Data

By the end of the course, you will gain the confidence to clean data from various types and use record linkage to merge multiple datasets. Cleaning data is an essential skill for data scientists. If you want to learn more about cleaning data in Python and its applications, check out the following tracks: Data Scientist with Python and Importing & Cleaning Data with Python.
Empresas

Group¿Entrenar a 2 o más personas?

Obtenga acceso de su equipo a la biblioteca completa de DataCamp, con informes centralizados, tareas, proyectos y más
Pruebe DataCamp Para EmpresasPara obtener una solución a medida, solicite una demonstración.
  1. 1

    Common data problems

    Gratuito

    In this chapter, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.

    Reproducir Capítulo Ahora
    Data type constraints
    50 xp
    Common data types
    100 xp
    Numeric data or ... ?
    100 xp
    Summing strings and concatenating numbers
    100 xp
    Data range constraints
    50 xp
    Tire size constraints
    100 xp
    Back to the future
    100 xp
    Uniqueness constraints
    50 xp
    How big is your subset?
    50 xp
    Finding duplicates
    100 xp
    Treating duplicates
    100 xp
  2. 2

    Text and categorical data problems

    Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this chapter, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

    Reproducir Capítulo Ahora
  3. 3

    Advanced data problems

    In this chapter, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.

    Reproducir Capítulo Ahora
  4. 4

    Record linkage

    Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this chapter, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.

    Reproducir Capítulo Ahora

En las siguientes pistas

Científico de datos asociado en PythonIngeniero de datos en PythonImportar y limpiar datos con Python

Colaboradores

Collaborator's avatar
Maggie Matsui
Collaborator's avatar
Richie Cotton
Collaborator's avatar
Amy Peterson
Adel Nehme HeadshotAdel Nehme

VP of Media, DataCamp

Ver Mas

¿Qué tienen que decir otros alumnos?

Únete a 13 millones de estudiantes y empeza Cleaning Data in Python hoy!

Crea Tu Cuenta Gratuita

GoogleLinkedInFacebook

o

Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.