Software reliability: measurement, prediction, application
Software reliability: measurement, prediction, application
The Detection of Fault-Prone Programs
IEEE Transactions on Software Engineering
C4.5: programs for machine learning
C4.5: programs for machine learning
A Validation of Object-Oriented Design Metrics as Quality Indicators
IEEE Transactions on Software Engineering
Explora: a multipattern and multistrategy discovery assistant
Advances in knowledge discovery and data mining
A Critique of Software Defect Prediction Models
IEEE Transactions on Software Engineering
Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Elements of Software Science (Operating and programming systems series)
Elements of Software Science (Operating and programming systems series)
Machine Learning
Machine Learning
Software Metrics: A Rigorous and Practical Approach
Software Metrics: A Rigorous and Practical Approach
A Metrics Suite for Object Oriented Design
IEEE Transactions on Software Engineering
Machine Learning
SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts
ECML '93 Proceedings of the European Conference on Machine Learning
Classification Rule Learning with APRIORI-C
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
An Algorithm for Multi-relational Discovery of Subgroups
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Subgroup Discovery with CN2-SD
The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Reliability and Validity in Comparative Studies of Software Prediction Models
IEEE Transactions on Software Engineering
Building Defect Prediction Models in Practice
IEEE Software
Finding the Right Data for Software Cost Modeling
IEEE Software
Propositionalization-based relational subgroup discovery with RSD
Machine Learning
Interestingness measures for data mining: A survey
ACM Computing Surveys (CSUR)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
How to measure success of fault prediction models
Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting
The Effects of Over and Under Sampling on Fault-prone Module Detection
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Mining software repositories for comprehensible software fault prediction models
Journal of Systems and Software
Predicting defect-prone software modules using support vector machines
Journal of Systems and Software
Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection
Expert Systems with Applications: An International Journal
IEEE Transactions on Software Engineering
Analysis of Naive Bayes' assumptions on software fault data: An empirical study
Data & Knowledge Engineering
Expert Systems with Applications: An International Journal
Information Sciences: an International Journal
Expert-guided subgroup discovery: methodology and application
Journal of Artificial Intelligence Research
Journal of Systems and Software
Information Sciences: an International Journal
Effort-Aware Defect Prediction Models
CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
An overview on subgroup discovery: foundations and applications
Knowledge and Information Systems
User preferences based software defect detection algorithms selection using MCDM
Information Sciences: an International Journal
IEEE Transactions on Neural Networks
Editorial: Data mining for software trustworthiness
Information Sciences: an International Journal
A study of subgroup discovery approaches for defect prediction
Information and Software Technology
Software defect prediction using relational association rule mining
Information Sciences: an International Journal
Hi-index | 0.07 |
Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only fault-prone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases.