is the process of identifying database records that
are syntactically different but refer to the same entity. This problem
has also been studied as duplicate detection, name matching, identity
uncertainty, database hardening and citation matching. Our work is
primarily focusing on using machine learning algorithms for training
similarity metrics and comparison methods to improve matching accuracy.
It is related to our work on text mining
See the RIDDLE Repository on Identity Uncertainty,
Duplicate Detection, and Record Linkage for datasets, bibliography, and more information on this topic.