Evaluating noise elimination techniques for software quality estimation

Authors:
Taghi M. Khoshgoftaar;Pierre Rebours
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
Venue:
Intelligent Data Analysis
Year:
2005

Citing 29
Cited 1

Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Knowledge in context: a strategy for expert system maintenance

AI '88 Proceedings of the second Australian joint conference on Artificial intelligence
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Case-based reasoning

Case-based reasoning
Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems

Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems
Bagging predictors

Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Technical Note: Naive Bayes for Regression

Machine Learning
Ordinal association rules for error identification in data sets

Proceedings of the tenth international conference on Information and knowledge management
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Genetic Programming Model for Software Quality Classification

HASE '01 The 6th IEEE International Symposium on High-Assurance Systems Engineering: Special Topic: Impact of Networking
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Comparison of Noise Handling Techniques

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Using qualitative hypotheses to identify inaccurate data

Journal of Artificial Intelligence Research
Evaluating noise correction

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Performance Analysis of Class Noise Detection Algorithms

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The poor quality of a training dataset can have untoward consequences in software quality estimation problems. The presence of noise in software measurement data may hinder the prediction accuracy of a given learner. A filter improves the quality of training datasets by removing data that is likely noise. We evaluate the Ensemble Filter against the Partitioning Filter and the Classification Filter. These filtering techniques combine the predictions of base classifiers in such a way that an instance is identified as noisy if it is misclassified by a given number of these learners. The Partitioning Filter first splits the training dataset into subsets, and different base learners are induced on each subset. Two different implementations of the Partitioning Filter are presented: the Multiple-Partitioning Filter and the Iterative-Partitioning Filter. In contrast, the Ensemble Filter uses base classifiers induced on the entire training dataset. The filtering level and/or the number of iterations modify the filtering conservativeness: a conservative filter is less likely to remove good data at the expense of retaining noisy instances. A unique measure for comparing the relative efficiencies of two filters is also presented. Empirical studies on a high assurance software project evaluate the relative performances of the Ensemble Filter, Multiple-Partitioning Filter, Iterative-Partitioning Filter, and Classification Filter. Our study demonstrates that with a conservative filtering approach, using several different base learners can improve the efficiency of the filtering schemes.