Alignments and String Similarity in Information Integration:  A Random Field Approach

Alignments and String Similarity in Information Integration: A Random Field Approach (2005)

Several problems central to information integration, such as ontology mapping and object matching, can be viewed as alignment tasks where the goal is to find an optimal correspondence between two structured objects and to compute the associated similarity score. The diversity of data sources and domains in the Semantic Web requires solutions to these problems to be highly adaptive, which can be achieved by employing probabilistic machine learning approaches. We present one such approach, Alignment Conditional Random Fields (ACRFs), a new framework for constructing and scoring sequence alignments using undirected graphical models. ACRFs allow incorporating arbitrary features into string edit distance computation, yielding a learnable string similarity function for use in tasks where approximate string matching is needed. We outline possible applications of ACRFs in information integration tasks and describe directions for future work.

View:

PDF, PS

Citation:

In Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.

Bibtex:

People

Mikhail Bilenko	Ph.D. Alumni	mbilenko [at] microsoft com
Raymond J. Mooney	Faculty	mooney [at] cs utexas edu

Areas of Interest

Machine Learning Record Linkage & Duplicate Detection Text Data Mining

Labs

Machine Learning