Integrating Top-down and Bottom-up Approaches in Inductive Logic Programming: Applications in Natural Language Processing and Relational Data Mining (2003)
Inductive Logic Programming (ILP) is the intersection of Machine Learning and Logic Programming in which the learner's hypothesis space is the set of logic programs. There are two major ILP approaches: top-down and bottom-up. The former searches the hypothesis space from general to specific while the latter the other way round. Integrating both approaches has been demonstrated to be more effective. Integrated ILP systems were previously developed for two tasks: learning semantic parsers (Chillin), and mining relational data (Progol). Two new integrated ILP systems for these tasks that overcome limitations of existing methods will be presented.
Cocktail is a new ILP algorithm for inducing semantic parsers. For this task, two features of a parse state, functional structure and context, provide important information for disambiguation. A bottom-up approach is more suitable for learning the former, while top-down is better for the latter. By allowing both approaches to induce program clauses and choosing the best combination of their results, Cocktail learns more effective parsers. Experimental results on learning natural-language interfaces for two databases demonstrate that it learns more accurate parsers than Chillin, the previous best method for this task.
Beth is a new integrated ILP algorithm for relational data mining. The Inverse Entailment approach to ILP, implemented in the Progol and Aleph systems, starts with the construction of a bottom clause, the most specific hypothesis covering a seed example. When mining relational data with a large number of background facts, the bottom clause becomes intractably large, making learning very inefficient. A top-down approach heuristically guides the construction of clauses without building a bottom clause; however, it wastes time exploring clauses that cover no positive examples. By using a top-down approach to heuristically guide the construction of generalizations of a bottom clause, Beth combines the strength of both approaches. Learning patterns for detecting potential terrorist activity is a current challenge problem for relational data mining. Experimental results on artificial data for this task with over half a million facts show that Beth is significantly more efficient at discovering such patterns than Aleph and m-Foil, two leading ILP systems.
PhD Thesis, Department of Computer Sciences, University of Texas.

Lappoon R. Tang Ph.D. Alumni ltang [at] utb edu