Active Feature-Value Acquisition for Classifier Induction (2004)
Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-2004), pp. 483-486, Brighton, UK, November 2004.

Prem Melville Ph.D. Alumni pmelvi [at] us ibm com
Raymond J. Mooney Faculty mooney [at] cs utexas edu