Skip to main content
HomeTutorialsSpreadsheets

How to Clean Data in Excel: A Beginners Guide

Learn essential data cleaning techniques in Excel, including removing duplicates, handling missing values, and maintaining consistent formatting.
Jun 2024  · 15 min read

Microsoft Excel remains one of the most widely used and accessible options for data cleaning. It offers a range of powerful formulas and functions to streamline the process, making it easier for users to clean, organize, and prepare their data efficiently.

In this tutorial, you’ll learn how to clean your data in Excel and prepare it for analysis. We'll cover essential techniques such as removing duplicates, handling missing values, and standardizing formatting. By the end, you'll be equipped with practical skills to ensure your dataset is accurate and ready for deeper analysis.

For a more in-depth exploration, consider taking the comprehensive Data Preparation in Excel companion course, which dives into advanced techniques and best practices for optimizing your data cleaning workflow.

What Are the Components of Clean Data?

Ensuring high data quality involves several key components, including accuracy, completeness, consistency, uniformity, and validity. These components are essential for reliable analysis and decision-making. Let's take a look at each.

Accuracy

Accuracy means your data correctly displays the real-world values it represents. It ensures that the information shown is precise and free from errors, reflecting the true state of the data. For example, if your data is accurate, it will have the following:

  • Precise and Error-Free Information: The information shown is precise and free from errors.
  • Numerically Correct Values: The numerical data is correct and reliable.
  • Typo-Free Text Data: All text data is free from typographical errors.
  • Precise Dates: Dates are accurately recorded and reflect the true timeline of events.

To check your data’s accuracy, you can cross-reference data against reliable sources or use validation rules to detect and correct errors.

Completeness

Complete data contains all the necessary information for the analysis. If your data is incomplete and key details are missing, this can skew results. Make sure to fill in gaps or account for missing values in your analysis.

Here are some basic ways to address missing data:

  • Input Values Based on Other Observations: Uses other data points to estimate and fill in missing values.
  • Replace Missing Values with Placeholders: Use placeholders to account for missing information.
  • Remove Incomplete Records: Deletes records that lack critical information to ensure data quality.

Consistency

Consistency means data remains even across different datasets and periods. Clean data will contain the same formats and units of measurement throughout.

Uniformity

Uniformity or standardization means that all data will exhibit one coherent format and structure—it should either be of one data type or one category. This will include using the same units of measure or formatting dates and labeling them in categories.

Validity

Validity means the data values fall within the predetermined acceptable range and conform to the expected patterns. For example, if a valid age entry should be between 0 and 120 years, validation rules and checks will be applied to ensure that the data meets this criterion. This prevents outliers and erroneous entries from skewing the results.

What is Data Cleaning in Excel?

Cleaning up data in Excel means refining raw data. Unlike data validation, which is a specific feature in Excel’s toolbar, data cleaning is a more general term that involves a broader range of tools and techniques.

Let’s see what are some of the advantages of cleaning data in Excel:

Advantages and benefits of clean data in Excel

Here’s what data cleaning in Excel can help with:

  • Reduces Time Needed for Data Analysis: Clean data streamlines the analysis process, saving valuable time.
  • Provides More Reliable Insights: Accurate and consistent data leads to better decision-making.
  • Easier Visualization: Clean data simplifies the creation of accurate graphs and other visual representations.
  • Minimizes Costs: Reduces expenses associated with data correction and rework.
  • Supports Scalability: Ensures data processes and systems can grow without quality degradation.

Why Data Cleaning is Important

Clean data is essential for correctly analyzing data. Without it, any interpretation or conclusion becomes flawed. This phenomenon is captured by the expression "garbage in, garbage out" (GIGO)—referring to the idea that poor input quality leads to poor output.

Here’s why cleaning data in Excel is important:

  • Eliminates Unwanted Outliers: It gets rid of unwanted outliers, which can lead to incorrect conclusions.
  • Resolve Structural Errors: It resolves mixed or incongruous naming conventions, incorrect data types, and partial data entries.
  • Handles Missing Data: It excludes incomplete cases, so the analysis is correct.
  • Validation and QA: It ensures the analysis is conducted using credible information.

Why Excel is a Good Option for Data Cleaning

Excel is an ideal choice for data cleaning because it has comprehensive data manipulation features, including sorting, filtering, and text functions

Its compatibility with diversified file formats and automation capabilities through macros and Power Query improves operational efficiency. This makes Excel a versatile option for individuals and organizations who want to clean and manage their data easily in a short time.

How to Clean Data in Excel

Let’s explore some top techniques and data-cleaning tools in Excel. In the following section, we will cover each of these ideas.

  • Basic Cleaning: Basic cleaning involves addressing common issues like extra spaces, blank cells, and spelling errors to ensure a clean and consistent dataset.
  • Error Handling and Validation: Error handling and validation focus on identifying and correcting errors and ensuring that your data maintains its integrity by removing duplicates.
  • Text Operations: Text operations involve manipulating and formatting text data to ensure it meets your requirements, including concatenation and case changes.
  • Data Transformation: Data transformation techniques are used to reorganize and reshape your data for better analysis, including parsing text and using tools like Flash Fill.
  • Numerical and Date Fixes: Numerical and date fixes involve correcting and standardizing numerical data and dates to ensure they are accurate and formatted consistently.
  • Advanced Data Management: Advanced data management includes reconciling and combining datasets to create a comprehensive and cohesive dataset for analysis.

Basic Data Cleaning in Excel

Let's take a look at basic cleaning, which involves addressing common issues like removing extra spaces.

Get rid of extra spaces

Trailing white spaces can be annoying or problematic because they can lead to inconsistencies in data analysis and formatting issues.

There are two major ways to get rid of extra spaces in Excel. Let’s review both.

Get rid of extra spaces using Find and Replace

Here are the steps to get rid of extra spaces.

  • Select the range of cells where you want to remove extra spaces.

  • Use the Ctrl + H shortcut to open the Find and Replace dialog box.

  • In the Find what field, press the spacebar twice to enter two spaces.

  • In the Replace with field, press the spacebar once to enter a single space.

  • Click on Replace All.

  • Repeat the process until no more double spaces are found.

Find and replace the dialog box. Source: Image by Author

Get rid of extra spaces using TRIM

  • Create a new column for the cleaned data.

  • Enter the formula =TRIM(cell_with_extra_spaces) in the first cell of the new column.

  • Double-click the bottom right corner of the cell to apply the formula to the rest of the rows.

  • Copy the cleaned data and paste it as values to remove the formulas.

Select and treat all blank cells

Blank cells can cause problems by breaking formulas, leading to errors in calculations and producing inaccurate analysis results. Here’s how to handle this issue.

  • Highlight the range of cells where you want to find and treat blank cells.

  • Press Ctrl + G to open the Go To dialog box.

  • Click on the “Special…” button. This opens the Go To Special dialog box.

  • Select the Blanks option and click OK. Now, you will see that all the blank cells in the chosen range are selected.

  • You can now enter a value or formula. Press Ctrl + Enter.

Select and treat blank cells. Source: Image by Author

Spell checking

Misspelled words can make data look unprofessional and difficult to understand, so it's important to correct them.

  • Check the entire worksheet or a specific range.

  • Go to the Review tab on the Ribbon.

  • Click the Spelling button in the Proofing group.

  • The Spelling dialog box will open, showing the first detected misspelled word and suggestions for correction.

  • Review and correct the misspelled words as needed.

Spell checking. Source: Image by Author

Error Handling and Validation in Excel

Let's take a look at error handling, which involves addressing common issues like highlighting errors or removing duplicates.

Highlight errors

Errors in your data can lead to incorrect results and poor decision-making, so it's crucial to identify and address them. Let’s see how to tackle this problem.

  • Select the range of cells you want to check for errors.

  • Go to the Home tab on the Ribbon.

  • In the Styles group, click on Conditional Formatting.

  • Select New Rule from the dropdown menu.

  • Choose Use a formula to determine which cells to format.

  • Enter the formula =ISERROR(cell) in the Format values where this formula is true field.

  • Click the Format… button to choose your desired formatting options.

  • Click OK after selecting your formatting options.

  • Click OK again in the New Formatting Rule dialog box to apply the rule.

Using conditional formattingUsing conditional formatting. Source: Image by Author

Remove duplicates

Duplicate entries can distort analysis and insights, so removing them ensures data accuracy.

  • Select the worksheet or range of cells from which you want to remove duplicates.

  • Go to the Data tab on the Ribbon.

  • Click on Remove Duplicates in the Data Tools group.

  • In the Remove Duplicates dialog box, specify which column to check for duplicates.

  • Click OK. Excel will display a message showing the number of duplicate values removed.

Remove duplicate rows

Duplicate rows can clutter your data and affect the reliability of your analysis. Here’s how to manage this.

  • Select the range from where you want to remove the duplicate rows.

  • Go to the Data tab on the Ribbon and click on Advanced in the Sort & Filter group.

  • In the Advanced Filter dialog box, select Copy to another location.

  • Set the List range to your selected range.

  • Set the Copy to field to the cell where you want the unique rows to be copied to.

  • Check the Unique Records Only box and then click OK.

Text Operations in Excel

Let's take a look at text operations like concatenating strings.

Concatenate

Combining multiple text strings into one cell can help create more meaningful and organized data by merging information from different sources into a single, cohesive format.

  • Select the cell where you want the concatenated result to appear.

  • Type =CONCATENATE().
  • Select the cells you want to concatenate, separated by commas or the & operator.

  • Press Enter to see the result.

Concatenating cells in ExcelConcatenating cells. Source: Image by Author

Changing the case of text

Uniform text casing can improve readability and maintain consistency in your data. Let’s see how to adjust this.

  • Create a new column for the converted text.

  • Enter the formula in a cell: =UPPER() or =LOWER() or =PROPER()

  • Press Enter to apply the formula.
  • Drag the fill handle to apply the formula to other cells if needed.

Changing the case of text in ExcelChanging the case of text. Source: Image by Author

Removing nonprinting characters from text

Nonprinting characters such as tabs, line breaks, and special characters, can cause problems by disrupting data processing, making it difficult to analyze or visualize data correctly.

  • Identify the nonprinting characters you need to remove.

  • Select the cell containing the text with nonprinting characters.

  • In a new cell, use the following function: =CLEAN(text).

  • To remove extra spaces and nonprinting characters, use: =TRIM(CLEAN(text)).

  • Copy and paste the cleaned text to replace the original.

Using the clean function in ExcelUsing the clean function in Excel. Source: Image by Author

Data Transformation in Excel

Data transformation techniques are used to reorganize and reshape your data for better analysis, including parsing text and using tools like Flash Fill.

Data parsing from text to column

Separating text data into individual columns makes it easier to analyze specific components and ensures that each piece of information is distinctly categorized. This is the opposite of concatenation and is useful for breaking down complex data into manageable parts. Here’s how to go about it.

  • Select the data you want to split.

  • Go to the Data tab and click Text to Columns.

  • Choose Delimited or Fixed Width and click Next.

  • For Delimited, check the delimiters your data uses and click Next.

  • For Fixed Width, set column breaks in the Data Preview window and click Next.

  • Choose the data format for each column.

  • Click Finish.