Predicting risky modules in open-source software for high-performance computing

Authors:
Amit A. Phadke;Edward B. Allen
Affiliations:
Mississippi State University;Mississippi State University
Venue:
Proceedings of the second international workshop on Software engineering for high performance computing system applications
Year:
2005

Citing 10
Cited 1

Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
C4.5: programs for machine learning

C4.5: programs for machine learning
Efficient management of parallelism in object-oriented numerical software libraries

Modern software tools for scientific computing
Datrix™ source code model and its interchange format: lessons learned and considerations for future work

ACM SIGSOFT Software Engineering Notes
Data Mining of Software Development Databases

Software Quality Control
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
Investigation of Logistic Regression as a Discriminant of Software Quality

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Deriving a Fault Architecture from Defect History

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Predicting Source Code Changes by Mining Change History

IEEE Transactions on Software Engineering

Empirical validation of object-oriented metrics for predicting fault proneness models

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the position that software-quality modeling of open-source software for high-performance computing can identify modules that have a high risk of bugs.Given the source code for a recent release, a model can predict which modules are likely to have bugs, based on data from past releases. If a user knows which software modules correspond to functionality of interest, then risks to operations become apparent. If the risks are too great, the user may prefer not to upgrade to the most recent release.Of course, such predictions are never perfect. After release, bugs are discovered. Some bugs are missed by the model, and some predicted errors do not occur. A successful model will be accurate enough for informed management action at the time of the predictions.As evidence for this position, this paper summarizes a case study of the Portable Extensible Toolkit for Scientific Computation (PETSC), which is a mathematical library for high-performance computing. Data was drawn from source-code and configuration management logs. The accuracy of logistic-regression and decision-tree models indicated that the methodology is promising. The case study also illustrated several modeling issues.