Cleaning Data in PostgreSQL Databases

Learn to tame your raw, messy data stored in a PostgreSQL database to extract accurate insights.
Start Course for Free
4 Hours15 Videos49 Exercises
4050 XP

Create Your Free Account

GoogleLinkedInFacebook
or
By continuing you accept the Terms of Use and Privacy Policy. You also accept that you are aware that your data will be stored outside of the EU and that you are above the age of 16.

Loved by learners at thousands of companies


Course Description

If you surveyed a large number of data scientists and data analysts about which tasks are most common in their workday, cleaning data would likely be in almost all responses. This is the case because real-world data is messy. To help you tame messy data, this course teaches you how to clean data stored in a PostgreSQL database. You’ll learn how to solve common problems such as how to clean messy strings, deal with empty values, compare the similarity between strings, and much more. You’ll get hands-on practice with these tasks using interesting (but messy) datasets made available by New York City's Open Data program. Are you ready to whip that messy data into shape?

  1. 1

    Data Cleaning Basics

    Free
    In this chapter, you’ll gain an understanding of data cleaning approaches when working with PostgreSQL databases and learn the value of cleaning data as early as possible in the pipeline. You’ll also learn basic string editing approaches such as removing unnecessary spaces as well as more involved topics such as pattern matching and string similarity to identify string values in need of cleaning.
    Play Chapter Now
  2. 2

    Missing, Duplicate, and Invalid Data

    You’ll learn how to write queries to solve common problems of missing, duplicate, and invalid data in the context of PostgreSQL database tables. Through hands-on exercises, you’ll use the COALESCE() function, SELECT query, and WHERE clause to clean messy data.
    Play Chapter Now
  3. 3

    Converting Data

    Sometimes you need to convert data stored in a PostgreSQL database from one data type to another. In this chapter, you’ll explore the expressions you need to convert text to numeric types and how to format strings for temporal data.
    Play Chapter Now
  4. 4

    Transforming Data

    In the final chapter, you’ll learn how to transform your data and construct pivot tables. Working with real-world postal data, you’ll discover how to combine and split addresses into city, state, and zip codes using a multitude of powerful functions including CONCAT(), SUBSTRING(), and REGEXP_SPLIT_TO_TABLE().
    Play Chapter Now
Datasets
Parking violations in NYCRestaurant inspections in NYCFilm permits in NYC
Collaborators
Maggie MatsuiAmy Peterson
Prerequisites
Intermediate SQL
Darryl Reeves Ph.D Headshot

Darryl Reeves Ph.D

Industry Assistant Professor, NYU Tandon School of Engineering
Darryl is a computational scientist with expertise in utilizing data-driven approaches to solve complex problems in both academic and business settings. He worked for a number of years in a variety of technical roles including software development and technology-based client services mostly within start-up organizations in the finance and online advertising industries. He has a love for technology and education and enjoys solving interesting problems across diverse domains.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA