Finding duplicate values using a self-join relationship
This procedure uses a
self-join relationship and a calculation field referencing the relationship to determine when duplicate records exist.
To find duplicate records except the first instance:
1. If you plan to delete the duplicate records that you find, make a backup copy of the file.
2. Identify a
field that determines a unique entity in your file.
For example, in a Contacts database, the Last Name field is probably not a good choice, because you might have several people with the same last name. Employee ID is a better choice. You can also create a calculation field (returning a text result) that combines data in several fields to make a unique identifier. An example
formula is
First Name & Last Name & Phone Number.
3. Define a self-join relationship.
Use your chosen identifying field as the
match field in both tables in the relationship. See
Creating and changing relationships.
4. Create a new calculation field named Check Duplicate with the formula:
If(Count(<name of the table’s second occurrence>::<match field>) > 1; "Duplicates"; "Unique")
6. Perform a find for Duplicates in the Check Duplicates field.
All records with duplicates are marked Duplicates.
Once set up as above, this system will identify duplicate records automatically as they are created.