Finding records > Finding duplicate values using a self-join relationship
 

Finding duplicate values using a self-join relationship

This procedure uses a self-join relationship and a calculation field referencing the relationship to identify all duplicate records except the first instance, according to the sort order defined in the relationship. Once this system is set up, duplicate records will be identified as they are created.

To find duplicate records except the first instance:

1. If you plan to delete the duplicate records that you find, make a backup copy of the file.

See Saving and copying files.

2. Identify a field that determines a unique entity in your file.

For example, in a Contacts database, use an Employee ID field. You can also create a calculation field with a calculation (returning a text result) that combines data from several fields to make a unique identifier. For example, First Name & Last Name & Phone Number. Don’t use a repeating field.

3. In the relationships graph, drag your chosen identifying field from a table occurrence back onto itself to define a self-join relationship.

Your chosen identifying field becomes the match field in both table occurrences in the self-join relationship. See Creating and changing relationships.

4. In the Add Relationship dialog box, name the new table occurrence Same.

FileMaker Pro creates a second occurrence of the table upon which you’re basing the self-join.

5. Define two fields:

 

Field

Definition

Counter

A text field with an auto-entered serial number (select Serial number and accept the default values for Next and Increment by).

The purpose of this field is to assign a value to each record that is unique. If your database already has such a field—for example, the default PrimaryKey field—you can use that field instead of the Counter field in the calculation for the Check Duplicates field below, and you can skip step 7.

Check Duplicates

A calculation field with a text result, with the formula:

If ( Counter = Same::Counter ; "Unique" ; "Duplicate" )

This calculation uses the self-join relationship to compare the value of Counter in the current table occurrence to the value in the related table occurrence named Same. When your chosen identifying field (the relationship’s match field) is the same as that in the related table occurrence and the Counter field values are the same, then the current and related records are the same record; therefore the current record is marked Unique. If the Counter field values are not the same, then the current and related records are different records whose chosen identifying field has the same value; therefore the current record is marked Duplicate.

6. Click Show All in the status toolbar.

7. Click the new Counter field, choose Records menu > Replace Field Contents, and Replace with serial numbers. Again, accept the default values, Select Update serial number in Entry Options, and click Replace.

This will assign a serial number to all existing records in your database. Serial numbers will automatically be entered in new records.

8. Perform a find for Duplicate in the Check Duplicates field.

The first record in any series of duplicates now holds the value Unique in the Check Duplicates field, and all duplicate records within the same series are marked Duplicate.

Important  Records with no value in the match field will be flagged as duplicates. Once set up as above, this system will identify duplicate records automatically as they are created.