Finding duplicate values using a self-join relationship

This procedure uses a self-join relationship and a calculation field referencing the relationship to identify all duplicate records except the first instance, according to the sort order defined in the relationship. Once this system is set up, duplicate records will be identified as they are created.

To find duplicate records except the first instance:

  1. If you plan to delete the duplicate records that you find, make a backup copy of the file.

    See Saving and copying files.

  2. Choose the field that you want to check for duplicate values.

    If you want to check whether multiple fields taken together have duplicate values, you can create a calculation field with a calculation (returning a text result) that combines data from several fields and check the calculation field for duplicates. For example, First Name & Last Name & Phone Number. Don't use a repeating field.

  3. In the relationships graph, drag your chosen field from a table occurrence back onto itself to define a self-join relationship.

    Your chosen field becomes the match field in both table occurrences in the self-join relationship. See Creating and changing relationships.

  4. In the Add Relationship dialog box, name the new table occurrence Same.

    FileMaker Pro creates a second occurrence of the table upon which you're basing the self-join.

  5. Define two fields:

  6. Field

    Definition

    Counter

    A text field with an auto-entered serial number (select Serial number and accept the default values for Next and Increment by).

    The purpose of this field is to assign a value to each record that is unique. If your database already has such a field—for example, the default PrimaryKey field—you can use that field instead of the Counter field in the calculation for the Check Duplicates field below, and you can skip step 7.

    Check Duplicates

    A calculation field with a text result, with the formula:

    Copy
    If ( Counter = Same::Counter ; "Unique" ; "Duplicate" )

    This calculation uses the self-join relationship to compare the value of Counter in the current table occurrence to the value in the related table occurrence named Same. When your chosen field (the relationship's match field) is the same as that in the related table occurrence and the Counter field values are the same, then the current and related records are the same record; therefore the current record is marked Unique. If the Counter field values are not the same, then the current and related records are different records whose chosen field has the same value; therefore the current record is marked Duplicate.

  7. Click Show All in the status toolbar.

  8. Click the new Counter field, choose Records menu > Replace Field Contents, and Replace with serial numbers. Again, accept the default values, Select Update serial number in Entry Options, and click Replace.

    This will assign a serial number to all existing records in your database. Serial numbers will automatically be entered in new records.

  9. Perform a find for Duplicate in the Check Duplicates field.

    The first record in any series of duplicates now holds the value Unique in the Check Duplicates field, and all duplicate records within the same series are marked Duplicate.

Important  Records with no value in the match field will be flagged as duplicates. Once set up as above, this system will identify duplicate records automatically as they are created.