Instance-Based Learning Algorithms
Machine Learning
Original Contribution: Stacked generalization
Neural Networks
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
IEEE Transactions on Knowledge and Data Engineering
Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Decontamination of Training Samples for Supervised Pattern Recognition Methods
Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Identifying and Eliminating Irrelevant Instances Using Information Theory
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A Noise Filtering Method for Inductive Concept Learning
AI '02 Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Improving Classification by Removing or Relabeling Mislabeled Instances
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Assessing and improving the quality of knowledge discovery data
Data warehousing and web engineering
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
Stopping criterion for boosting based data reduction techniques: from binary to multiclass problem
The Journal of Machine Learning Research
Probabilistic Noise Identification and Data Cleaning
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Outlier Mining in Large High-Dimensional Data Sets
IEEE Transactions on Knowledge and Data Engineering
A model for handling approximate, noisy or incomplete labeling in text classification
ICML '05 Proceedings of the 22nd international conference on Machine learning
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Data Mining and Knowledge Discovery
Journal of Biomedical Informatics
An algorithm for correcting mislabeled data
Intelligent Data Analysis
A boosting approach to remove class label noise
International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
The multiple imputation quantitative noise corrector
Intelligent Data Analysis
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
Unsupervised data pruning for clustering of noisy data
Knowledge-Based Systems
Class Noise Mitigation Through Instance Weighting
ECML '07 Proceedings of the 18th European conference on Machine Learning
Conceptual equivalence for contrast mining in classification learning
Data & Knowledge Engineering
Efficiently learning the accuracy of labeling sources for selective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust support vector machine training via convex outlier ablation
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Maintenance by a Committee of Experts: The MACE Approach to Case-Base Maintenance
ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Class noise detection using frequent itemsets
Intelligent Data Analysis
Arguing from Experience to Classifying Noisy Data
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Loss optimal monotone relabeling of noisy multi-criteria data sets
Information Sciences: an International Journal
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Empirical case studies in attribute noise detection
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Improving boosting by exploiting former assumptions
MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
Sensitivity of different machine learning algorithms to noise
Journal of Computing Sciences in Colleges
Identifying mislabeled training data with the aid of unlabeled data
Applied Intelligence
Combining feature and example pruning by uncertainty minimization
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
PISA: A framework for multiagent classification using argumentation
Data & Knowledge Engineering
Improving Text Classification Accuracy by Training Label Cleaning
ACM Transactions on Information Systems (TOIS)
Impact of noise on credit risk prediction: Does data quality really matter?
Intelligent Data Analysis
Hi-index | 0.00 |
This paper presents a new approach to identifying and eliminating mislabeled training instances. The goal of this technique is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. The approach employs an ensemble of classifiers that serve as a filter for the training data. Using an n-fold cross validation, the training data is passed through the filter. Only instances that the filter classifies correctly are passed to the final learning algorithm. We present an empirical evaluation of the approach for the task of automated land cover mapping from remotely sensed data. Labeling error arises in these data from a multitude of sources including lack of consistency in the vegetation classification used, variable measurement techniques, and variation in the spatial sampling resolution. Our evaluation shows that for noise levels of less than 40%, filtering results in higher predictive accuracy than not filtering, and for levels of class noise less than or equal to 20% filtering allows the base-line accuracy to be retained. Our empirical results suggest that the ensemble filter approach is an effective method for identifying labeling errors, and further, that the approach will significantly benefit ongoing research to develop accurate and robust remote sensing-based methods to map land cover at global scales.