Duplicates and Duplicate Matching Rules

The CRM is equipped with several features for dealing with duplicate contacts.
Some attempt to avoid the duplicate contact from being created, others help you to search, identify and "merge" duplicate contacts found in your database.

These features include:

  • Scanning your database for duplicates using a selected dedupe rule (see below) and merge duplicate contact data as needed.
  • Manually selecting two contacts using the search and then choosing to merge these contacts.
  • Automatically scanning your database and suggesting duplicates when contacts are added or edited via the user interface.
  • Automatically merging a person's details to an existing contact when a person signs up online for items such as Events, Membership, Contributions and Profile pages.
  • Automatically merging a person's details when importing contacts.

 

How does the CRM know if a contact is a duplicate?

The CRM uses predefined dedupe rules that define what consists of a duplicate contact. 

For example, a duplicate rule might require that both the First Name and Email match in order to consider the new contact a match.

These rules can be configured by Navigating to Contact > Find and Merge Duplicates

Three types of rules. 

  • General Rule 
    • These rules are used if you want to run a report of all your duplicates and proceed to merge them. 
    • You can have as many of these rules as you want
       
  • Supervised Rule
    • This rule is automatically used to check for possible duplicates when contacts are added or edited via the user interface.
    • Supervised Rules should be configured with a broader definition of what constitutes a duplicate. 
    • You can only have 1 Supervised Rule per contact type (Household, Organization, Individual)
       
  • Unsupervised Rule
    • This rule is automatically used when new contacts are created through online registrations (Events, Membership, Contributions and Profile forms) and are also selected by default when you Import contacts.
    • These rules are generally configured with a narrow definition of what constitutes a duplicate. (Example: First Name, Last Name, Email)
    • You can only have 1 Supervised Rule per contact type (Household, Organization, Individual)
    • If you do not have an Unsupervised Rule, your forms will not work. 

Configuring rules

To determine whether two contacts are duplicates, CiviCRM checks up to five fields that you can specify. You can also set a length value which determines how many characters in the field should be compared. For example, if you set a length of 2 on the First Name field, a first name of "Mike" would match "Michael" and they would be recognized as duplicates, because the first 2 characters are the same. However, if you set the length to 3 instead, "Mike" would no longer match "Michael" and they would be accepted as different contacts. If the length value is left blank, the comparison is done on the entire field value.

Each field is also configured with a numeric weight that determines the relative importance of a match on that field. When a match is discovered on a field, that field's weight is added to the total weight for the rule. After each field is checked, if the total weight is equal to or greater than the numerical threshold set for the rule, the contacts being compared are flagged as suspected duplicates.

Reserved Rules

A Reserved rule is simply a rule that cannot be deleted. These are the rules that are by default in the CRM. Here is a list of the weights and thresholds for the Reserved rules.

Name and Email (Reserved) 

First = 5 
Last = 7
Email = 10
Threshold = 20

Email (Reserved)

Email = 10
Threshold = 10

Name and Address (Reserved)

First = 5
Last = 5
Street address = 5
Middle = 1
Suffix = 1
Threshold = 15