Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce
Proceedings of the ninth international conference on Electronic commerce
Active Feature-Value Acquisition
Management Science
Data Mining and Knowledge Discovery
Concept Learning from (Very) Ambiguous Examples
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Exploiting Data Missingness in Bayesian Network Modeling
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Cautious Collective Classification
The Journal of Machine Learning Research
An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data
The Journal of Machine Learning Research
Predicting incomplete gene microarray data with the use of supervised learning algorithms
Pattern Recognition Letters
Towards learning rules from natural texts
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Journal of Intelligent Information Systems
Predicting clicks in a vocabulary learning system
HLT-SS '11 Proceedings of the ACL 2011 Student Session
A robust missing value imputation method for noisy data
Applied Intelligence
Sequential feature selection for classification
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Predictive analytics in information systems research
MIS Quarterly
An evolving associative classifier for incomplete database
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Information enhancement for data mining
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Besting the quiz master: crowdsourcing incremental classification games
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Classifying patterns with missing values using Multi-Task Learning perceptrons
Expert Systems with Applications: An International Journal
Optimum estimation of missing values in randomized complete block design by genetic algorithm
Knowledge-Based Systems
Creating and benchmarking a new dataset for physical activity monitoring
Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
Data & Knowledge Engineering
Skyline queries in crowd-enabled databases
Proceedings of the 16th International Conference on Extending Database Technology
Boosting with side information
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
An algorithmic approach to missing data problem in modeling human aspects in software development
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Advances in Artificial Intelligence
The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition
Pattern Recognition Letters
Hi-index | 0.00 |
Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods---predictive value imputation, the distribution-based imputation used by C4.5, and using reduced models---for applying classification trees to instances with missing values (and also shows evidence that the results generalize to bagged trees and to logistic regression). The results show that for the two most popular treatments, each is preferable under different conditions. Strikingly the reduced-models approach, seldom mentioned or used, consistently outperforms the other two methods, sometimes by a large margin. The lack of attention to reduced modeling may be due in part to its (perceived) expense in terms of computation or storage. Therefore, we then introduce and evaluate alternative, hybrid approaches that allow users to balance between more accurate but computationally expensive reduced modeling and the other, less accurate but less computationally expensive treatments. The results show that the hybrid methods can scale gracefully to the amount of investment in computation/storage, and that they outperform imputation even for small investments.