Enhancing software quality estimation using ensemble-classifier based noise filtering

Authors:
Taghi M. Khoshgoftaar;Shi Zhong;Vedang Joshi
Affiliations:
Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA. E-mail: {taghi, zhong, vjoshi}@cse.fau.edu;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA. E-mail: {taghi, zhong, vjoshi}@cse.fau.edu;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA. E-mail: {taghi, zhong, vjoshi}@cse.fau.edu
Venue:
Intelligent Data Analysis
Year:
2005

Citing 31
Cited 19

Algorithms for clustering data

Algorithms for clustering data
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Knowledge in context: a strategy for expert system maintenance

AI '88 Proceedings of the second Australian joint conference on Artificial intelligence
A philosophical basis for knowledge acquisition

Knowledge Acquisition
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Multivariate Decision Trees

Machine Learning
Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems

Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems
Bagging predictors

Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Data quality and systems theory

Communications of the ACM
The impact of poor data quality on the typical enterprise

Communications of the ACM
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Technical Note: Naive Bayes for Regression

Machine Learning
Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Conditions for Occam's Razor Applicability and Noise Elimination

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Genetic Programming Model for Software Quality Classification

HASE '01 The 6th IEEE International Symposium on High-Assurance Systems Engineering: Special Topic: Impact of Networking
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Correcting Noisy Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Option Decision Trees with Majority Votes

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering

Determining noisy instances relative to attributes of interest

Intelligent Data Analysis
Evaluating indirect and direct classification techniques for network intrusion detection

Intelligent Data Analysis
Detecting noisy instances with the rule-based classification model

Intelligent Data Analysis
Identifying noisy features with the Pairwise Attribute Noise Detection Algorithm

Intelligent Data Analysis
Evaluating noise elimination techniques for software quality estimation

Intelligent Data Analysis
Improving software quality prediction by noise filtering techniques

Journal of Computer Science and Technology
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data

Journal of Systems and Software
The multiple imputation quantitative noise corrector

Intelligent Data Analysis
Imputation techniques for multivariate missingness in software measurement data

Software Quality Control
Class noise detection using frequent itemsets

Intelligent Data Analysis
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
Empirical case studies in attribute noise detection

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
A novel classification algorithm to noise data

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Software mining and fault prediction

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Incomplete-case nearest neighbor imputation in software measurement data

Information Sciences: an International Journal
An empirical study of the classification performance of learners on imbalanced and noisy software quality data

Information Sciences: an International Journal
Ensemble-based noise detection: noise ranking and visual performance evaluation

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a technique that improves the accuracy of classification models by enhancing the quality of training data. The idea is to eliminate instances that are likely to be noisy, and train classification models on "clean" data. Our approach uses 25 different classification techniques to create an ensemble classifier to filter noise. Using a relatively large number of base-level classifiers in the ensemble filter helps achieve different levels of desired noise removal conservativeness with several possible levels of filtering. It also provides a high degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possible) inappropriate learning bias of a few algorithms with 25 base-level classifiers than with a relatively smaller number of base-level classifiers. An empirical case study with software measurement data of a high assurance software project demonstrates the effectiveness of our noise elimination approach in improving classification accuracies. The similarities among predictions from the 25 classifiers are also investigated, and preliminary results suggest that the 25 classifiers may be effectively reduced to 13.