A study of subgroup discovery approaches for defect prediction

Authors:
Daniel Rodriguez;Roberto Ruiz;Jose C. Riquelme;Rachel Harrison
Affiliations:
-;-;-;-
Venue:
Information and Software Technology
Year:
2013

Citing 62
Cited 0

Design complexity measurement and testing

Communications of the ACM
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
An investigation into coupling measures for C++

ICSE '97 Proceedings of the 19th international conference on Software engineering
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Experimental assessment of the effect of inheritance on the maintainability of object-oriented systems

Journal of Systems and Software - Special issue on Evaluation and assessment in software engineering
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Random Forests

Machine Learning
An empirical evaluation of fault-proneness models

Proceedings of the 24th International Conference on Software Engineering
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
The CN2 Induction Algorithm

Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Thresholds for Object-Oriented Measures

ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Reliability and Validity in Comparative Studies of Software Prediction Models

IEEE Transactions on Software Engineering
Building Defect Prediction Models in Practice

IEEE Software
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Software Defect Association Mining and Defect Correction Effort Prediction

IEEE Transactions on Software Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
Software Defect Prediction Using Regression via Classification

AICCSA '06 Proceedings of the IEEE International Conference on Computer Systems and Applications
Mining software repositories for comprehensible software fault prediction models

Journal of Systems and Software
Predicting defect-prone software modules using support vector machines

Journal of Systems and Software
Comparing software metrics tools

ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
A hybrid heuristic approach to optimize rule-based software quality estimation models

Information and Software Technology
Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

The Journal of Machine Learning Research
Data Mining for Software Engineering

Computer
Expert-guided subgroup discovery: methodology and application

Journal of Artificial Intelligence Research
A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

Journal of Systems and Software
Empirical validation of object-oriented metrics for predicting fault proneness models

Software Quality Control
Finding software metrics threshold values using ROC curves

Journal of Software Maintenance and Evolution: Research and Practice
Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering
An ant colony optimization algorithm to improve software quality prediction models: Case of class stability

Information and Software Technology
Effort-Aware Defect Prediction Models

CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Data mining in software engineering

Intelligent Data Analysis
An overview on subgroup discovery: foundations and applications

Knowledge and Information Systems
User preferences based software defect detection algorithms selection using MCDM

Information Sciences: an International Journal
Searching for rules to detect defective modules: A subgroup discovery approach

Information Sciences: an International Journal
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Application of neural networks to software quality modeling of a very large telecommunications system

IEEE Transactions on Neural Networks
A Systematic Literature Review on Fault Prediction Performance in Software Engineering

IEEE Transactions on Software Engineering
Local versus Global Lessons for Defect Prediction and Effort Estimation

IEEE Transactions on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Although many papers have been published on software defect prediction techniques, machine learning approaches have yet to be fully explored. Objective: In this paper we suggest using a descriptive approach for defect prediction rather than the precise classification techniques that are usually adopted. This allows us to characterise defective modules with simple rules that can easily be applied by practitioners and deliver a practical (or engineering) approach rather than a highly accurate result. Method: We describe two well-known subgroup discovery algorithms, the SD algorithm and the CN2-SD algorithm to obtain rules that identify defect prone modules. The empirical work is performed with publicly available datasets from the Promise repository and object-oriented metrics from an Eclipse repository related to defect prediction. Subgroup discovery algorithms mitigate against characteristics of datasets that hinder the applicability of classification algorithms and so remove the need for preprocessing techniques. Results: The results show that the generated rules can be used to guide testing effort in order to improve the quality of software development projects. Such rules can indicate metrics, their threshold values and relationships between metrics of defective modules. Conclusions: The induced rules are simple to use and easy to understand as they provide a description rather than a complete classification of the whole dataset. Thus this paper represents an engineering approach to defect prediction, i.e., an approach which is useful in practice, easily understandable and can be applied by practitioners.