Algorithms for clustering data
Algorithms for clustering data
Applied multivariate statistical analysis
Applied multivariate statistical analysis
Knowledge in context: a strategy for expert system maintenance
AI '88 Proceedings of the second Australian joint conference on Artificial intelligence
A philosophical basis for knowledge acquisition
Knowledge Acquisition
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems
Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems
Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Artificial Intelligence Review - Special issue on lazy learning
Data quality and systems theory
Communications of the ACM
The impact of poor data quality on the typical enterprise
Communications of the ACM
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Technical Note: Naive Bayes for Regression
Machine Learning
Controlling Overfitting in Classification-Tree Models ofSoftware Quality
Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality
Empirical Software Engineering
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Conditions for Occam's Razor Applicability and Noise Elimination
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Genetic Programming Model for Software Quality Classification
HASE '01 The 6th IEEE International Symposium on High-Assurance Systems Engineering: Special Topic: Impact of Networking
Generating Accurate Rule Sets Without Global Optimization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Option Decision Trees with Majority Votes
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Determining noisy instances relative to attributes of interest
Intelligent Data Analysis
Evaluating indirect and direct classification techniques for network intrusion detection
Intelligent Data Analysis
Detecting noisy instances with the rule-based classification model
Intelligent Data Analysis
Identifying noisy features with the Pairwise Attribute Noise Detection Algorithm
Intelligent Data Analysis
Evaluating noise elimination techniques for software quality estimation
Intelligent Data Analysis
Improving software quality prediction by noise filtering techniques
Journal of Computer Science and Technology
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
The multiple imputation quantitative noise corrector
Intelligent Data Analysis
Imputation techniques for multivariate missingness in software measurement data
Software Quality Control
Class noise detection using frequent itemsets
Intelligent Data Analysis
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Empirical case studies in attribute noise detection
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
An industrial case study of classifier ensembles for locating software defects
Software Quality Control
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
A novel classification algorithm to noise data
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Software mining and fault prediction
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Incomplete-case nearest neighbor imputation in software measurement data
Information Sciences: an International Journal
Information Sciences: an International Journal
Ensemble-based noise detection: noise ranking and visual performance evaluation
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
This paper presents a technique that improves the accuracy of classification models by enhancing the quality of training data. The idea is to eliminate instances that are likely to be noisy, and train classification models on "clean" data. Our approach uses 25 different classification techniques to create an ensemble classifier to filter noise. Using a relatively large number of base-level classifiers in the ensemble filter helps achieve different levels of desired noise removal conservativeness with several possible levels of filtering. It also provides a high degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possible) inappropriate learning bias of a few algorithms with 25 base-level classifiers than with a relatively smaller number of base-level classifiers. An empirical case study with software measurement data of a high assurance software project demonstrates the effectiveness of our noise elimination approach in improving classification accuracies. The similarities among predictions from the 25 classifiers are also investigated, and preliminary results suggest that the 25 classifiers may be effectively reduced to 13.