Instance-Based Learning Algorithms
Machine Learning
Experimental software engineering: a report on the state of the art
Proceedings of the 17th international conference on Software engineering
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Lazy learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Experimentation in software engineering: an introduction
Experimentation in software engineering: an introduction
Comparing case-based reasoning classifiers for predicting high risk software components
Journal of Systems and Software
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Emerald: Software Metrics and Models on the Desktop
IEEE Software
A Classification Scheme for Studies on Fault-Prone Components
PROFES '01 Proceedings of the Third International Conference on Product Focused Software Process Improvement
Predicting Fault-Proneness using OO Metrics: An Industrial Case Study
CSMR '02 Proceedings of the 6th European Conference on Software Maintenance and Reengineering
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study
Empirical Software Engineering
A selective sampling approach to active feature selection
Artificial Intelligence
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Classifier evaluation under limited resources
Pattern Recognition Letters
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Detecting noisy instances with the rule-based classification model
Intelligent Data Analysis
Adequate and Precise Evaluation of Quality Models in Software Engineering Studies
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Learning with Limited Minority Class Data
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
An Empirical Study of Learning from Imbalanced Data Using Random Forest
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Visual Analytics for Requirements-driven Risk Assessment
REV '07 Proceedings of the Second International Workshop on Requirements Engineering Visualization
Robust Feature Selection Using Ensemble Feature Selection Techniques
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
IEEE Transactions on Software Engineering
Enhancing network based intrusion detection for imbalanced data
International Journal of Knowledge-based and Intelligent Engineering Systems
The Impact of Gene Selection on Imbalanced Microarray Expression Data
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Hybrid sampling for imbalanced data
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
RUSBoost: A Hybrid Approach to Alleviating Class Imbalance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.01 |
A timely detection of high-risk program modules in high-assurance software is critical for avoiding the high consequences of operational failures. While software risk can initiate from external sources, such as management or outsourcing, software quality is adversely affected when internal software risks are realized, such as improper practice of standard software processes or lack of a defined software quality infrastructure. Practitioners employ various techniques to identify and rectify high-risk or low-quality program modules. Effectiveness of detecting such modules is affected by the software measurements used, making feature selection an important step during software quality prediction. We use a wrapper-based feature ranking technique to select the optimal set of software metrics to build defect prediction models. We also address the adverse effects of class imbalance (very few low-quality modules compared to high-quality modules), a practical problem observed in high-assurance systems. Applying a data sampling technique followed by feature selection is a relatively unique contribution of our work. We present a comprehensive investigation on the impact of data sampling followed by attribute selection on the defect predictors built with imbalanced data. The case study data are obtained from several real-world high-assurance software projects. The key results are that attribute selection is more efficient when applied after data sampling, and defect prediction performance generally improves after applying data sampling and feature selection.