A pattern-based outlier detection method identifying abnormal attributes in software project data

Authors:
Kyung-A Yoon;Doo-Hwan Bae
Affiliations:
Division of Computer Science, College of Information Science & Technology, KAIST, 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea;Division of Computer Science, College of Information Science & Technology, KAIST, 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea
Venue:
Information and Software Technology
Year:
2010

Citing 22
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Quality information and knowledge

Quality information and knowledge
Measuring the software process: statistical process control for software process improvement

Measuring the software process: statistical process control for software process improvement
Information Retrieval

Information Retrieval
Data Quality for the Information Age

Data Quality for the Information Age
Induction of Decision Trees

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Predicting Source Code Changes by Mining Change History

IEEE Transactions on Software Engineering
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Software Defect Association Mining and Defect Correction Effort Prediction

IEEE Transactions on Software Engineering
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Determining noisy instances relative to attributes of interest

Intelligent Data Analysis
The pairwise attribute noise detection algorithm

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
An empirical analysis of software effort estimation with outlier elimination

Proceedings of the 4th international workshop on Predictor models in software engineering
Dynamic project performance estimation by combining static estimation models with system dynamics

Information and Software Technology
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Data quality: cinderella at the software metrics ball?

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
A data clustering algorithm for stratified data partitioning in artificial neural network

Expert Systems with Applications: An International Journal
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Data quality in empirical software engineering: a targeted review

Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite the importance of the quality of software project data, problematic data inevitably occurs during data collection. These data are the outliers with abnormal values on certain attributes, which we call the abnormal attributes of outliers. Manually detecting outliers and their abnormal attributes is laborious and time consuming. Although few existing approaches identify outliers and their abnormal attributes, these approaches are not effective in (1) identifying the abnormal attributes when the outlier has abnormal values on more than the specific number of its attributes or (2) discovering accurate rules to detect outliers and their abnormal attributes. In this paper, we propose a pattern-based outlier detection method that identifies abnormal attributes in software project data: after discovering the reliable frequent patterns that reflect the typical characteristics of the software project data, outliers and their abnormal attributes are detected by matching the software project data with those patterns. Empirical studies were performed on three industrial data sets and 48 artificial data sets with injected outliers. The results demonstrate that our approach outperforms five other approaches by an average of 35.27% and 107.5% in detecting the outliers and abnormal attributes, respectively, on the industrial data sets, and an average of 35.44% and 46.57%, respectively on the artificial data sets.