Matching Rules

listclean attempts to guess the matching rule and default parameters for each field in the data, i.e. Column Headers in row 1 of the spreadsheet. You may change the rules and also the parameters in Advanced Mode. Two of the matching rules have been optimised for Marketing Data, Company Name and Personal Name.

Matching Rule	Description
Company Identifier Sounds Like	Two Company Identifiers match if they sound-like each other. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check.
Company Name Starts With Or Contains	Two Company Identifiers match if they either start with the same sequence of characters or one contains the same sequence of characters. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check.
Exact	The content of the two fields must match exactly. This can be case-sensitive or insensitive.
First Name	The two First Names match if they are synonyms of each other, e.g. Bob and Robert would match.
Full Name	The Full Names are split into First Name and Last Name and the appropriate rules followed.
Ignore	The fields are ignored in any duplicate matching.
Last Name	The two Last Names match if they sound-like each other.
Sounds Like	The two fields match if they sound-like each other.
Starts With Or Contains	The two fields match if they either start with the same sequence of characters or one contains the same sequence of characters.

A list of the matching rules for deduplicating data

Company Identifiers Sound Like

In Simple Mode listclean carries out the following actions to match two Company Identifiers:

splits the Company Identifier into tokens, i.e. finds words separated by spaces.
determines if the last token matches a set of types of company types (Limited, Ltd, etc.).
checks whether each token, apart from a company type token, matches using a default sounds-like algorithm.
marks the Company Identifiers as a duplicate if all non company type tokens match.

In Advanced Mode you can change the rule parameters:

Company Identifiers Starts With or Contains

In Simple Mode listclean carries out the following actions to match two Company Identifiers:

splits the Company Identifier into tokens, i.e. finds words separated by spaces.
determines if the last token matches a set of types of company types (Limited, Ltd, etc.).
checks whether 8 characters at the start of the Company Identifier, ignoring the company type part, matches, ignoring case.
marks the Company Identifiers as a duplicate if the strings match match.

In Advanced Mode you can change the rule parameters:

Exact

In Simple Mode listclean checks if the two fields match exactly, ignoring case.

In Advanced Mode you can change the rule parameters:

First Name

listclean uses a synonym dictionary to check if two First Names match. The table below shows 3 example rows from the dictionary:

Default Name	Synonyms
Edward	Edward, Eddie, Ed, Ted, Teddy, Eddy, Ned
Robert	Robert, Rob, Bob, Robbie, Bobby
Elizabeth	Elizabeth, Elisabeth, Beth, Betty, Libby

Full Name

In Simple Mode listclean splits the field into a First Name and a Last Name. If a comma is found in the value then the order for the fields is assumed to be [Last Name], [First Name]. If only a space is found then the order is assumed to be [First Name] [Last Name]. Having split the column value into two parts a duplicate is found if the First Names are synonyms and the Last Names sound the same.

In Advanced Mode you can change the rule parameters:

Ignore

The field will be ignored during duplicate checking, i.e. all values are assumed to match, but will be merged when duplicates are detected and merged.

Last Name

Last Names match if they sound the same.

In Advanced Mode you can change the rule parameters:

Sounds Like

Fields match if they sound the same.

In Advanced Mode you can change the rule parameters:

Starts With Or Contains

Fields match if they start with the same 8 characters, ignoring case.

In Advanced Mode you can change the rule parameters: