Alignments and String Similarity in Information Integration: A Random Field Approach (2005)
Several problems central to information integration, such as ontology mapping and object matching, can be viewed as alignment tasks where the goal is to find an optimal correspondence between two structured objects and to compute the associated similarity score. The diversity of data sources and domains in the Semantic Web requires solutions to these problems to be highly adaptive, which can be achieved by employing probabilistic machine learning approaches. We present one such approach, Alignment Conditional Random Fields (ACRFs), a new framework for constructing and scoring sequence alignments using undirected graphical models. ACRFs allow incorporating arbitrary features into string edit distance computation, yielding a learnable string similarity function for use in tasks where approximate string matching is needed. We outline possible applications of ACRFs in information integration tasks and describe directions for future work.
In Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.

Mikhail Bilenko Ph.D. Alumni mbilenko [at] microsoft com
Raymond J. Mooney Faculty mooney [at] cs utexas edu