Noise elimination with partitioning filter for software quality estimation

Authors:
Taghi M. Khoshgoftaar;Pierre Rebours
Affiliations:
Department of Computer Science and Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton 33431, FL, USA.;Department of Computer Science and Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton 33431, FL, USA
Venue:
International Journal of Computer Applications in Technology
Year:
2006

Citing 13
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Data quality and systems theory

Communications of the ACM
The impact of poor data quality on the typical enterprise

Communications of the ACM
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A Framework for Analysis of Data Quality Research

IEEE Transactions on Knowledge and Data Engineering
A Comparison of Noise Handling Techniques

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering
Analyzing Software Measurement Data with Clustering Techniques

IEEE Intelligent Systems
Detecting outliers using rule-based modeling for improving CBR-based software quality classification models

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development

Software Reliability Prediction Using Group Method of Data Handling

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two new noise filtering techniques which improve the quality of training datasets by removing data points that are likely to be noisy. In addition, a new measure called 'efficiency paired comparison' is introduced for simplifying the comparison between two filters. The filtering techniques are based on the partitioning approach the training dataset is first split into subsets, and base learners are induced on each of these subsets. The predictions are then combined in such a way that an instance in the training data is identified as noisy if it is misclassified by a certain number of base learners. The first technique, multiple partitioning filter combines several classifiers induced on each subset. The second technique, iterative-partitioning filter uses only one base learner but goes through multiple filtering iterations. The amount of noise removed by the techniques is varied by tuning either the filtering level or the number of iterations. Empirical studies using software measurement data from a high assurance software project assess the efficiencies of our two noise filtering approaches. The empirical results suggest that using several base classifiers as well as performing several iterations with a conservative filtering scheme can improve the efficiency of the filtering technique.