Statistical analysis with missing data
Statistical analysis with missing data
Structured induction in expert systems
Structured induction in expert systems
Algorithms for clustering data
Algorithms for clustering data
Unknown attribute values in induction
Proceedings of the sixth international workshop on Machine learning
Instance-Based Learning Algorithms
Machine Learning
The Use of Background Knowledge in Decision Tree Induction
Machine Learning
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection
Neural Computation
C4.5: programs for machine learning
C4.5: programs for machine learning
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Learning to classify incomplete examples
Computational learning theory and natural learning systems: Volume IV
Knowing what doesn't matter: exploiting the omission of irrelevant data
Artificial Intelligence - Special issue on relevance
Data preparation for data mining
Data preparation for data mining
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Understanding the Crucial Role of AttributeInteraction in Data Mining
Artificial Intelligence Review
Data Quality for the Information Age
Data Quality for the Information Age
Machine Learning
Machine Learning
Learning Belief Networks in the Presence of Missing Values and Hidden Variables
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On Active Learning for Data Acquisition
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Data Acquisition with Active and Impact-Sensitive Instance Selection
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Error detection and impact-sensitive instance ranking in noisy datasets
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Journal of Artificial Intelligence Research
Budgeted learning of nailve-bayes classifiers
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Partial example acquisition in cost-sensitive learning
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
ACM Transactions on Knowledge Discovery from Data (TKDD)
Cost sensitive classification in data mining
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Fast data acquisition in cost-sensitive learning
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Repeated labeling using multiple noisy labelers
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Real-world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. To build accurate predictive models, data acquisition is usually adopted to prepare the data and complete missing values. However, due to the significant cost of doing so and the inherent correlations in the data set, acquiring correct information for all instances is prohibitive and unnecessary. An interesting and important problem that arises here is to select what kinds of instances to complete so the model built from the processed data can receive the "maximum驴 performance improvement. This problem is complicated by the reality that the costs associated with the attributes are different, and fixing the missing values of some attributes is inherently more expensive than others. Therefore, the problem becomes that given a fixed budget, what kinds of instances should be selected for preparation, so that the learner built from the processed data set can maximize its performance? In this paper, we propose a solution for this problem, and the essential idea is to combine attribute costs and the relevance of each attribute to the target concept, so that the data acquisition can pay more attention to those attributes that are cheap in price but informative for classification. To this end, we will first introduce a unique Economical Factor (EF) that seamlessly integrates the cost and the importance (in terms of classification) of each attribute. Then, we will propose a cost-constrained data acquisition model, where active learning, missing value prediction, and impact-sensitive instance ranking are combined for effective data acquisition. Experimental results and comparative studies from real-world data sets demonstrate the effectiveness of our method.