Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Inductive Learning

Inductive learning methods can be defined as those methods that systematically produce general descriptions or knowledge from the specific knowledge provided by domain examples.
  1. Learning to Extract Relations from the Web using Minimal Supervision
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
    We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.
    ML ID: 204
  2. Multiple Instance Learning for Sparse Positive Bags
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 24th Annual International Conference on Machine Learning (ICML-2007), Corvallis, OR, June 2007.
    We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a positive bag is positive. Using both artificial and real-world data, we experimentally demonstrate that our approach achieves greater accuracy than state-of-the-art MIL methods when positive bags are sparse, and performs competitively when they are not. In particular, our approach is the best performing method for image region classification.
    ML ID: 201
  3. Creating Diverse Ensemble Classifiers to Reduce Supervision
    [Details] [PDF]
    Prem Melville
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, November 2005. 141 pages. Technical Report TR-05-49.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise.
    DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, Active-DECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that Active-DECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on Active-DECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach.
    Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.
    ML ID: 182
  4. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    Technical Report UT-AI-TR-04-311, Artificial Intelligence Lab, University of Texas at Austin, February 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 158
  5. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-2004), 483-486, Brighton, UK, November 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 157
  6. Diverse Ensembles for Active Learning
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 584-591, Banff, Canada, July 2004.
    Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces Active-Decorate, which uses Decorate committees to select good training examples. Extensive experimental results demonstrate that, in general, Active-DECORATE outperforms both Query by Bagging and Query by Boosting.
    ML ID: 146
  7. Experiments on Ensembles with Missing and Noisy Data
    [Details] [PDF]
    Prem Melville, Nishit Shah, Lilyana Mihalkova, and Raymond J. Mooney
    In F. Roli, J. Kittler, and T. Windeatt, editors, {Lecture Notes in Computer Science:} Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), 293-302, Cagliari, Italy, June 2004. Springer Verlag.
    One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.
    ML ID: 143
  8. Creating Diversity in Ensembles Using Artificial Data
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    Journal of Information Fusion: Special Issue on Diversity in Multi Classifier Systems, 6(1):99-111, 2004.
    The diversity of an ensemble of classifiers is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. DECORATE also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets.
    ML ID: 139
  9. Creating Diverse Ensemble Classifiers
    [Details] [PDF]
    Prem Melville
    Technical Report UT-AI-TR-03-306, Department of Computer Sciences, University of Texas at Austin, December 2003. Ph.D. proposal.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and Bagging. DECORATE also obtains higher accuracy than Boosting early in the learning curve when training data is limited.
    We propose to show that DECORATE can also be effectively used for (1) active learning, to reduce the number of training examples required to achieve high accuracy; (2) exploiting unlabeled data to improve accuracy in a semi-supervised learning setting; (3) combining active learning with semi-supervision for improved results; (4) obtaining better class membership probability estimates; (5) reducing the error of regressors; and (6) improving the accuracy of relational learners.
    ML ID: 132
  10. Constructing Diverse Classifier Ensembles Using Artificial Training Examples
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-2003), 505-510, Acapulco, Mexico, August 2003.
    Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and bagging (whereas boosting can occasionally decrease accuracy), and also obtains higher accuracy than boosting early in the learning curve when training data is limited.
    ML ID: 122
  11. Property-Based Feature Engineering and Selection
    [Details] [PDF]
    Noppadon Kamolvilassatian
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 2002. 85 pages.
    Example representation is a fundamental problem in machine learning. In particular, the decision on what features are extracted and selected to be included in the learning process significantly affects learning performance.
    This work proposes a novel framework for feature representation based on feature properties and applies it to the domain of textual information extraction. Our framework enables knowledge on feature engineering and selection to be explicitly learned and applied. The application of this knowledge can improve learning performance within the domain from which it is learned and in other domains with similar representational bias.
    We conducted several experiments comparing the performance of feature engineering and selection methods based on our framework with other approaches in the Information Extraction task. Results suggested that our approach performs either competitively or better than the best heuristic-based feature selection approach used. Moreover, our general framework can potentially be combined with other feature selection approaches to yield even better results.
    ML ID: 118
  12. Encouraging Experimental Results on Learning CNF
    [Details] [PDF]
    Raymond J. Mooney
    Machine Learning, 19(1):79-92, 1995.
    This paper presents results comparing three inductive learning systems using different representations for concepts, namely: CNF formulae, DNF formulae, and decision trees. The CNF learner performs surprisingly well. Results on five natural data sets show that it frequently trains faster and produces more accurate and simpler concepts. The probable explanation for its superior performance is that the other systems are more susceptible to the replication problem.
    ML ID: 42
  13. Growing Layers of Perceptrons: Introducing the Extentron Algorithm
    [Details] [PDF]
    Paul T. Baffes and John M. Zelle
    In Proceedings of the 1992 International Joint Conference on Neural Networks, 392--397, Baltimore, MD, June 1992.
    The ideas presented here are based on two observations of perceptrons: (1) when the perceptron learning algorithm cycles among hyperplanes, the hyperplanes may be compared to select one that gives a best SPLIT of the examples, an d (2) it is always possible for the perceptron to build a hyperplane that separates at least one example from all the rest. We describe the Extentron, which grows multi-layer networks capable of distinguishing non-linearly-separable data using the simple perceptron rule for linear threshold units. The resulting algorithm is simple, very fast, scales well to large problems, retains the convergence properties of the perceptron, and can be completely specified using only two parameters. Results are presented comparing the Extentron to other neural network paradigms and to symbolic learning systems.
    ML ID: 12
  14. Using Explanation-Based and Empirical Methods in Theory Revision
    [Details] [PDF]
    Dirk Ourston
    PhD Thesis, Department of Computer Science, University of Texas at Austin, 1991.

    The knowledge acquisition problem is a continuing problem in expert system development. The knowledge base (domain theory) initially formulated by the expert is usually only an approximation to the correct theory for the application domain. This initial knowledge base must be refined (usually manually) as problems are discovered. This research addresses the knowledge base refinement problem for classification tasks. The research provides an automatic method for correcting a domain theory in the light of incorrect performance on a set of training examples. The method uses attempted explanations to focus the correction on the failing part of the knowledge base. It then uses induction to supply a correction to the knoweledge base that will render it consistent with the training examples

    Using this technique, it is possible to correct overly general and overly specific theories, theories with multiple faults at various levels in the theory hierarchy, and theories involving multiple concepts. Methods have been developed for making corrections even in the presence of noisy data. Theoretical justification for the method is given in the form of convergence results that predict that the method will eventually converge to a hypothesis that is within a small error of the correct hypothesis, given sufficient examples. Because the technique currently relies on theorem proving for much of the analysis, it is quite expensive to computationally and heuristic methods for reducing the computational burden have been implemented.

    The system developed as part of the research is called EITHER (Explanation-based Inductive THeory Extension and Revision). EITHER uses propositional Horn clause logic as its knowledge representation, with examples expressed as attribute-value lists .The system has been tested in a variety of domains including revising a theory for the identification of promoters in DNA sequences and a theory for soybean disease diagnosis, where it has been shown to outperform a purely inductive approach.

    ML ID: 277
  15. Symbolic and Neural Learning Algorithms: An Experimental Comparison
    [Details] [PDF]
    J.W. Shavlik, Raymond J. Mooney and G. Towell
    Machine Learning, 6:111-143, 1991. Reprinted in {it Readings in Knowledge Acquisition and Learning}, Bruce G. Buchanan and David C. Wilkins (eds.), Morgan Kaufman, San Mateo, CA, 1993..
    Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with the perception and backpropagation neural learning algorithms have been performed using five large, real-world data sets. Overall, backpropagation performs slightly better than the other two algorithms in terms of classification accuracy on new examples, but takes much longer to train. Experimental results suggest that backpropagation can work significantly better on data sets containing numerical data. Also analyzed empirically are the effects of (1) the amount of training data, (2) imperfect training examples, and (3) the encoding of the desired outputs. Backpropagation occasionally outperforms the other two systems when given relatively small amounts of training data. It is slightly more accurate than ID3 when examples are noisy or incompletely specified. Finally, backpropagation more effectively utilizes a distributed output encoding.
    ML ID: 8
  16. An Experimental Comparison of Symbolic and Connectionist Learning Algorithms
    [Details] [PDF]
    Raymond J. Mooney, J.W. Shavlik, G. Towell and A. Gove
    In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), 775-780, Detroit, MI, August 1989. Reprinted in ``Readings in Machine Learning'', Jude W. Shavlik and T. G. Dietterich (eds.), Morgan Kaufman, San Mateo, CA, 1990..
    Despite the fact that many symbolic and connectionist (neural net) learning algorithms are addressing the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. This paper presents the results of experiments comparing the ID3 symbolic learning algorithm with the perceptron and back-propagation connectionist learning algorithms on several large real-world data sets. The results show that ID3 and perceptron run significantly faster than does back-propagation, both during learning and during classification of novel examples. However, the probability of correctly classifying new examples is about the same for the three systems. On noisy data sets there is some indication that back-propagation classifies more accurately.
    ML ID: 211