Matching Rules
listclean attempts to guess the matching rule and default parameters for each field in the data, i.e. Column Headers in row 1 of the spreadsheet. You may change the rules and also the parameters in Advanced Mode. Two of the matching rules have been optimised for Marketing Data, Company Name and Personal Name.
Matching Rule | Description |
---|---|
Company Identifier Sounds Like | Two Company Identifiers match if they sound-like each other. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check. |
Company Name Starts With Or Contains | Two Company Identifiers match if they either start with the same sequence of characters or one contains the same sequence of characters. If the last token (word) of the Company Identifier is one of a set tokens (Ltd, Limited, Plc etc.) it is ignored before the check. |
Exact | The content of the two fields must match exactly. This can be case-sensitive or insensitive. |
First Name | The two First Names match if they are synonyms of each other, e.g. Bob and Robert would match. |
Full Name | The Full Names are split into First Name and Last Name and the appropriate rules followed. |
Ignore | The fields are ignored in any duplicate matching. |
Last Name | The two Last Names match if they sound-like each other. |
Sounds Like | The two fields match if they sound-like each other. |
Starts With Or Contains | The two fields match if they either start with the same sequence of characters or one contains the same sequence of characters. |
Company Identifiers Sound Like
In Simple Mode listclean carries out the following actions to match two Company Identifiers:
In Advanced Mode you can change the rule parameters:
Company Identifiers Starts With or Contains
In Simple Mode listclean carries out the following actions to match two Company Identifiers:
In Advanced Mode you can change the rule parameters:
Exact
In Simple Mode listclean checks if the two fields match exactly, ignoring case.
In Advanced Mode you can change the rule parameters:
First Name
listclean uses a synonym dictionary to check if two First Names match. The table below shows 3 example rows from the dictionary:
Default Name | Synonyms |
---|---|
Edward | Edward, Eddie, Ed, Ted, Teddy, Eddy, Ned |
Robert | Robert, Rob, Bob, Robbie, Bobby |
Elizabeth | Elizabeth, Elisabeth, Beth, Betty, Libby |
Full Name
In Simple Mode listclean splits the field into a First Name and a Last Name. If a comma is found in the value then the order for the fields is assumed to be [Last Name], [First Name]. If only a space is found then the order is assumed to be [First Name] [Last Name]. Having split the column value into two parts a duplicate is found if the First Names are synonyms and the Last Names sound the same.
In Advanced Mode you can change the rule parameters:
Ignore
The field will be ignored during duplicate checking, i.e. all values are assumed to match, but will be merged when duplicates are detected and merged.
Last Name
Last Names match if they sound the same.
In Advanced Mode you can change the rule parameters:
Sounds Like
Fields match if they sound the same.
In Advanced Mode you can change the rule parameters:
Starts With Or Contains
Fields match if they start with the same 8 characters, ignoring case.