Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets

Authors:
Oral Alan;Cagatay Catal
Affiliations:
The Scientific and Technological Research Council of Turkey (TUBITAK), The National Research Institute of Electronics and Cryptology (UEKAE), Information Technologies Institute, 41470 Kocaeli, Tur ...;The Scientific and Technological Research Council of Turkey (TUBITAK), The National Research Institute of Electronics and Cryptology (UEKAE), Information Technologies Institute, 41470 Kocaeli, Tur ...
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 15
Cited 0

Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Discovering cluster-based local outliers

Pattern Recognition Letters
Analyzing Software Measurement Data with Clustering Techniques

IEEE Intelligent Systems
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Noise Identification with the k-Means Algorithm

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Learning to classify e-mail

Information Sciences: an International Journal
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Software quality estimation with limited fault data: a semi-supervised learning perspective

Software Quality Control
A Complexity Measure

IEEE Transactions on Software Engineering
Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem

Information Sciences: an International Journal
Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

Predicting the fault-proneness labels of software program modules is an emerging software quality assurance activity and the quality of datasets collected from previous software version affects the performance of fault prediction models. In this paper, we propose an outlier detection approach using metrics thresholds and class labels to identify class outliers. We evaluate our approach on public NASA datasets from PROMISE repository. Experiments reveal that this novel outlier detection method improves the performance of robust software fault prediction models based on Naive Bayes and Random Forests machine learning algorithms.