Comparative Experiments on Learning Information Extractors for Proteins and their Interactions (2005)
Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions more accurately than manually-developed rules.
Artificial Intelligence in Medicine (special issue on Summarization and Information Extraction from Medical Documents), 2 (2005), pp. 139-155.

Razvan Bunescu Ph.D. Alumni bunescu [at] ohio edu
Ruifang Ge Ph.D. Alumni grf [at] cs utexas edu
Rohit Kate Postdoctoral Alumni katerj [at] uwm edu
Raymond J. Mooney Faculty mooney [at] cs utexas edu
Yuk Wah Wong Ph.D. Alumni ywwong [at] cs utexas edu