Remove Duplicate Rows of Data in Excel

01
of 02

Remove Duplicate Data Records in Excel

Search for Identical Records by Field Name with Remove Duplicates in Excel
Remove Duplicates - Searching for Identical Records by Field Name. © Ted French

Spreadsheet programs such as Excel are often used as databases for things like parts inventories, sales records, and mailing lists.

Databases in Excel are comprised of tables of data that are normally organized into rows of data called records.

In a record, the data in each cell or field in the row is related - such as a company's name, address and phone number.

A common problem that occurs as a database grows in size is that of duplicate records or rows of data.

This duplication can occur if:

  • Entire records are entered into the database more than once resulting in two or more identical records
  • Multiple records have one or more fields - such as a name and address - containing the same data.

Either way, duplicate records can cause a whole host of problems - such as mailing multiple copies of documents to the same person when the database information is used in a mail merge - so it's a good idea to scan for and remove duplicate records on a regular basis.

And while it is easy to pick out duplicate records in a small sample like the one in the image above, data tables might easily contain hundreds if not thousands of records making it very difficult to pick out duplicate records - especially partially matching records.

To make it easier to accomplish this task, Excel has a built in data tool called, not surprisingly, Remove Duplicates, which can be used to find and remove identical as well as partially matching records.

However, the way the Remove Duplicates tool is designed, identical and partially matching records must be dealt with separately.

This is because the Remove Duplicates dialog box displays the field names for the selected data table and you choose which fields to include in the search for matching records:

  • For identical records, search for all fields - leave check marks next to all column or field names;
  • For partially matching records - leave check marks next to only those fields to be matched.

Field Names vs. Column Letters

As mentioned, the Remove Duplicates tool consist of a dialog box where you choose which matching fields to search for by checking off the desired field or column names.

The information that the dialog box displays - field names or column letters - depends upon whether your data contains a row of headings - or headers - at the top of the data table as seen in the image above.

If it does - make sure the option on the right hand side of the dialog box - My data has headers - is checked off and Excel will display the names in this row as field names in the dialog box.

If your data does not have a header row, the dialog box will display the appropriate column letters in the dialog box for the selected range of data.

Contiguous Range of Data

For the Remove Duplicates tool to work properly, the data table must be a contiguous range of data - that is it must not have any empty rows, columns, and, if at all possible, no empty cells located within the table.

Not having blanks within a data table is a good practice when it comes to data management in general and not just when searching for duplicate data. Excel's other data tools - such as sorting and filtering - work best when the data table is a contiguous range of data.

Remove Duplicate Data Records Example

In the image above, the data table contains two identical records for A. Thompson and two partially matching records for R. Holt - where all fields match except the student number.

The steps listed below detail how to use the Remove Duplicates data tool to:

  1. Remove the second of two identical records for A. Thompson.
  2. Remove the second partially matching record for R. Holt.

Opening the Remove Duplicates Dialog Box

  1. Click on any cell containing data in the sample database.
  2. Click the Data tab on the ribbon.
  3. Click on the Remove Duplicates icon to highlight all data in the data table and to open the Remove Duplicates dialog box.
  4. The Remove Duplicates dialog box displays all the column headings or field names from our data sample
  5. The check marks next to the field names indicate which columns Excel will try to match in searching for duplicate records
  6. By default, when the dialog box opens all of the field names are checked off

Finding Identical Records

  1. Since we are searching for completely identical records in this example we will leave all the column headings checked
  2. Click OK

At this point the following results should be seen:

  • The dialog box should close and be replaced by a message saying: 1 duplicate values found and removed; 7 unique values remain.
  • The row containing the duplicate A. Thompson record will have been removed from the database
  • Even though there are two partially matching records for R. Holt, because not all fields matched - the student number for the two records differ - Excel considers it to be a unique data record
02
of 02

Find and Remove Partially Matching Records with Remove Duplicates

Excel Remove Duplicates - Searching for Partially Matching Records by Field Name
Remove Duplicates - Searching for Partially Matching Records by Field Name. © Ted French

Checking One Field at a Time

Since Excel only removes data records that exactly match for the selected fields of data, the best way to find all partially matching data records is to remove the check mark for only one field at a time, as is done in the steps below.

Subsequent searches for records that match in all fields except for name, age, or program will remove all possible combinations for partially matching records.

Finding Partially Matching Records

  1. Click on any cell containing data in the data table if necessary
  2. Click the Data tab on the ribbon.
  3. Click on the Remove Duplicates icon to highlight all data in the data table and to open the Remove Duplicates dialog box.
  4. All field names or column headings for the data table are selected.
  5. To find and remove records that do not have a match in every field, remove the check mark from besides those field names that Excel is to ignore.
  6. For this example click on the check box beside the Student ID column heading to remove the check mark.
  7. Excel will now only search and remove records that have matching data in the Last Name, Initial, and Program fields.
  8. Click OK
  9. The dialog box should close and be replaced by a message saying: 1 duplicate values found and removed; 6 unique values remain.
  10. The row containing the second record for R. Holt with the Student ID of ST348-252 will have been removed from the database.
  11. Click OK to close the message box

At this point, the example data table should be free of all duplicate data.