Skip to content
Cleaning Data in Python
Run the hidden code cell below to import the data used in this course.
Hidden code
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
Run cancelled
# Add your code snippets here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- For each DataFrame, inspect the data types of each column and, where needed, clean and convert columns into the correct data type. You should also rename any columns to have more descriptive titles.
- Identify and remove all the duplicate rows in
ride_sharing
. - Inspect the unique values of all the columns in
airlines
and clean any inconsistencies. - For the
airlines
DataFrame, create a new column calledInternational
fromdest_region
, where values representing US regions map toFalse
and all other regions map toTrue
. - The
banking
DataFrame contains out of date ages. Update theAge
column using today's date and thebirth_date
column. - Clean the
restaurants_new
DataFrame so that it better matches the categories in thecity
andtype
column of therestaurants
DataFrame. Afterward, given typos in restaurant names, use record linkage to generate possible pairs of rows betweenrestaurants
andrestaurants_new
using criteria you think is best.