Experiments on Ensembles with Missing and Noisy Data (2004)
One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.
In {Lecture Notes in Computer Science:} Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), F. Roli, J. Kittler, and T. Windeatt (Eds.), Vol. 3077, pp. 293-302, Cagliari, Italy, June 2004. Springer Verlag.

Prem Melville Ph.D. Alumni pmelvi [at] us ibm com
Lilyana Mihalkova Ph.D. Alumni lilymihal [at] gmail com
Raymond J. Mooney Faculty mooney [at] cs utexas edu