Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use social filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommended previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations. These experiments are based on ratings from random samplings of items and we discuss problems with previous experiments that employ skewed samples of user-selected examples to evaluate performance.
ML ID: 96
This paper describes a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of words paired with meaning representations. Wolfie is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by Wolfie are compared to those acquired by a competing system developed by Siskind.
ML ID: 95
Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. This paper presents a system, Rapier, that takes pairs of sample documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. Rapier employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.
ML ID: 94
In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to two non-classification tasks in natural language processing: semantic parsing and information extraction. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.
ML ID: 92
Classifying web pages is an important task in automating the organization of information on the WWW, and learning for text categorization can help automate the development of such systems. This project explores using two aspects of HTML to improve learning for text categorization: 1) Using HTML tags such as titles, links, and headings to partition the text on a page and 2) Using the pages linked from a given page to augment its description. Initial experimental results on 26 categories from the Yahoo hierarchy demonstrate the promise of these two methods for improving the accuracy of a bag-of-words text classifier using a simple Bayesian learning algorithm.
ML ID: 91