Improving software quality prediction by noise filtering techniques

Authors:
Taghi M. Khoshgoftaar;Pierre Rebours
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
Journal of Computer Science and Technology
Year:
2007

Citing 12
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Case-based reasoning

Case-based reasoning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
A Framework for Analysis of Data Quality Research

IEEE Transactions on Knowledge and Data Engineering
A Comparison of Noise Handling Techniques

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Evaluating noise correction

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Data sets and data quality in software engineering

Proceedings of the 4th international workshop on Predictor models in software engineering
On the use of data filtering techniques for credit risk prediction with instance-based models

Expert Systems with Applications: An International Journal
Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification

Pattern Recognition
Data quality in empirical software engineering: a targeted review

Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset.