- Experiments on Ensembles with Missing and Noisy Data
Prem Melville, Nishit Shah, Lilyana Mihalkova, and Raymond J. Mooney
Proceedings of the Fifth International Workshop on Multiple Classifier Systems (MCS-2004), F. Roli, J. Kittler, and T. Windeatt (Eds.), Lecture Notes in Computer Science, Vol. 3077, pp. 293-302, Cagliari, Italy, Springer Verlag, June 2004.
Paper ID: 143
Category: General Inductive Learning, Ensemble Learning
One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.

mooney@cs.utexas.edu