Matching Rules

listclean attempts to guess the matching rule and default parameters for each field in the data, i.e. Column Headers in row 1 of the spreadsheet. You may change the rules and also the parameters in Advanced Mode. Two of the matching rules have been optimised for Marketing Data, Company Name and Personal Name.

Matching RuleDescription
Company Identifier Sounds LikeTwo Company Identifiers match if they sound-like each other. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check.
Company Name Starts With Or ContainsTwo Company Identifiers match if they either start with the same sequence of characters or one contains the same sequence of characters. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check.
ExactThe content of the two fields must match exactly. This can be case-sensitive or insensitive.
First NameThe two First Names match if they are synonyms of each other, e.g. Bob and Robert would match.
Full NameThe Full Names are split into First Name and Last Name and the appropriate rules followed.
IgnoreThe fields are ignored in any duplicate matching.
Last NameThe two Last Names match if they sound-like each other.
Sounds LikeThe two fields match if they sound-like each other.
Starts With Or ContainsThe two fields match if they either start with the same sequence of characters or one contains the same sequence of characters.
A list of the matching rules for deduplicating data

Company Identifiers Sound Like

In Simple Mode listclean carries out the following actions to match two Company Identifiers:

  • splits the Company Identifier into tokens, i.e. finds words separated by spaces.
  • determines if the last token matches a set of types of company types (Limited, Ltd, etc.).
  • checks whether each token, apart from a company type token, matches using a default sounds-like algorithm.
  • marks the Company Identifiers as a duplicate if all non company type tokens match.

In Advanced Mode you can change the rule parameters:
listclean-CompanyIdentifier-SoundsLike-Advanced

Company Identifiers Starts With or Contains

In Simple Mode listclean carries out the following actions to match two Company Identifiers:

  • splits the Company Identifier into tokens, i.e. finds words separated by spaces.
  • determines if the last token matches a set of types of company types (Limited, Ltd, etc.).
  • checks whether 8 characters at the start of the Company Identifier, ignoring the company type part, matches, ignoring case.
  • marks the Company Identifiers as a duplicate if the strings match match.

In Advanced Mode you can change the rule parameters:
listclean-CompanyIdentifier-StartsWith-Advanced

Exact

In Simple Mode listclean checks if the two fields match exactly, ignoring case.

In Advanced Mode you can change the rule parameters:
listclean-Exact-Advanced

First Name

listclean uses a synonym dictionary to check if two First Names match. The table below shows 3 example rows from the dictionary:

Default NameSynonyms
EdwardEdward, Eddie, Ed, Ted, Teddy, Eddy, Ned
RobertRobert, Rob, Bob, Robbie, Bobby
ElizabethElizabeth, Elisabeth, Beth, Betty, Libby

Full Name

In Simple Mode listclean splits the field into a First Name and a Last Name. If a comma is found in the value then the order for the fields is assumed to be [Last Name], [First Name]. If only a space is found then the order is assumed to be [First Name] [Last Name]. Having split the column value into two parts a duplicate is found if the First Names are synonyms and the Last Names sound the same.

In Advanced Mode you can change the rule parameters:
listclean-Fullname-Advanced

Ignore

The field will be ignored during duplicate checking, i.e. all values are assumed to match, but will be merged when duplicates are detected and merged.

Last Name

Last Names match if they sound the same.

In Advanced Mode you can change the rule parameters:
listclean-LastName-SoundsLike-Advanced

Sounds Like

Fields match if they sound the same.

In Advanced Mode you can change the rule parameters:
listclean-SoundsLike-Advanced

Starts With Or Contains

Fields match if they start with the same 8 characters, ignoring case.

In Advanced Mode you can change the rule parameters:
listclean-StartsWith-Advanced