Deduplicating Data

After you have specified the passes, you can now go to the main deduplicating page of listclean where you spend most of your time.

For cleaning non-hierarchical data you can deduplicate the data in a single pass. However, for hierarchical data you need to deduplicate the data in multiple passes.

The following screenshot shows the result of running the first pass on Accounts with Contacts data:

listclean-DeduplicateAccounts

listclean has found 4 possible sets of duplicate accounts. You have three options for each set:

  • manually merge the duplicate records, choosing the fields from each duplicate to form the merged record
  • let listclean merge the set. It assumes that the order of duplicates within the set is important and takes information first from the record with the lowest index. You can change the order of duplicates within the set by dragging and dropping them.
  • ignore this set of duplicates because they are not real duplicates

The following screenshot shows the page for manually merging duplicate records:

listclean-MergeData

 

The master column shows the merged data which will be saved and the dark background field, the one chosen for the master. You can stay on the Merge Data page for as long as you have duplicate sets remaining as it will show the next duplicate set when you click on Merge Data.

The following screen shows the result after running the second cleansing pass having deduplicated the Accounts:

listclean-DeduplicateContacts