Record Linkage & Duplicate Detection
Record linkage is the process of identifying database records that are syntactically different but refer to the same entity. This problem has also been studied as duplicate detection, name matching, identity uncertainty, database hardening and citation matching. Our work is primarily focusing on using machine learning algorithms for training similarity metrics and comparison methods to improve matching accuracy. It is related to our work on text mining.

See the RIDDLE Repository on Identity Uncertainty, Duplicate Detection, and Record Linkage for datasets, bibliography, and more information on this topic.