The development of natural language interfaces (NLI's) for databases has been a challenging problem in natural language processing (NLP) since the 1970's. The need for NLI's has become more pronounced due to the widespread access to complex databases now available through the Internet. A challenging problem for empirical NLP is the automated acquisition of NLI's from training examples. We present a method for integrating statistical and relational learning techniques for this task which exploits the strength of both approaches. Experimental results from three different domains suggest that such an approach is more robust than a previous purely logic-based approach.
ML ID: 102
Text mining and Information Extraction(IE) are both topics of significant recent interest. Text mining concerns applying data mining, a.k.a. knowledge discovery from databases (KDD) techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and KDD methods to perform a text mining task, discovering prediction rules from natural-language corpora. An initial version of DiscoTEX is constructed by integrating an IE module based on Rapier and a rule-learning module, Ripper. We present encouraging results on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
ML ID: 101
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
ML ID: 100
The development of natural language interfaces (NLIs) for databases has been an interesting problem in natural language processing since the 70's. The need for NLIs has become more pronounced given the widespread access to complex databases now available through the Internet. However, such systems are difficult to build and must be tailored to each application. A current research topic involves using machine learning methods to automate the development of NLI's. This proposal presents a method for learning semantic parsers (systems for mapping natural language to logical form) that integrates logic-based and probabilistic methods in order to exploit the complementary strengths of these competing approaches. More precisely, an inductive logic programming (ILP) method, TABULATE, is developed for learning multiple models that are integrated via linear weighted combination to produce probabilistic models for statistical semantic parsing. Initial experimental results from three different domains suggest that an integration of statistical and logical approaches to semantic parsing can outperform a purely logical approach. Future research will further develop this integrated approach and demonstrate its ability to improve the automated development of NLI's.
ML ID: 99
Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use collaborative filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommend previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations.
ML ID: 98
This article discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, it reviews our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory refinement, where an existing knowledge base is modified to make it consistent with a set of empirical data; and 2) The use of inductive learning methods to automatically acquire from examples a diagnostic knowledge base used for abductive reasoning. Experimental results on real-world problems are presented to illustrate the capabilities of both of these approaches to integrating the two forms of reasoning.
ML ID: 97
Most recent research in learning approaches to natural language have studied fairly ``low-level'' tasks such as morphology, part-of-speech tagging, and syntactic parsing. However, I believe that logical approaches may have the most relevance and impact at the level of semantic interpretation, where a logical representation of sentence meaning is important and useful. We have explored the use of inductive logic programming for learning parsers that map natural-language database queries into executable logical form. This work goes against the growing trend in computational linguistics of focusing on shallow but broad-coverage natural language tasks (``scaling up by dumbing down'') and instead concerns using logic-based learning to develop narrower, domain-specific systems that perform relatively deep processing. I first present a historical view of the shifting emphasis of research on various tasks in natural language processing and then briefly review our own work on learning for semantic interpretation. I will then attempt to encourage others to study such problems and explain why I believe logical approaches have the most to offer at the level of producing semantic interpretations of complete sentences.
ML ID: 93