RIDDLE: Repository of Information
on Duplicate Detection, Record Linkage, and Identity Uncertainty

Terms duplicate detection, record linkage, and identity uncertainty all refer to the problem of identifying syntactically different records that describe unique entities. The problem has been studied by a number of researchers from different communities, including statistics, databases, and machine learning. These pages will attempt to provide links to researchers, datasets, software, and papers that are related to this problem.

We are currently in the process of obtaining permission to post the datasets that were used in several recent papers. If you would like to be notified when this data becomes available, please send email to mbilenko@cs.utexas.edu.

The construction of this repository is an on-going process. If you are aware of an entry that it that should contain, please let us know.

Thank you and please come again!


Suggestions, comments, and questions to: Misha Bilenko mbilenko@cs.utexas.edu

Acknowledgment: This document was created based on the excellent home page of Ione Muslea's RISE Information Extraction Repository.