Evaluating logistic regression models to estimate software project outcomes

Authors:
Narciso Cerpa;Matthew Bardeen;Barbara Kitchenham;June Verner
Affiliations:
Facultad de Ingeniería, Universidad de Talca, Curicó, Chile;Facultad de Ingeniería, Universidad de Talca, Curicó, Chile;School of Computing and Mathematics, Keele University, Staffordshire ST5 5BG, UK;Computer Science and Engineering, University of New South Wales, Sydney, Australia
Venue:
Information and Software Technology
Year:
2010

Citing 35
Cited 2

An empirical validation of software cost estimation models

Communications of the ACM
Software runaways: monumental software disasters

Software runaways: monumental software disasters
Introduction to information systems success measurement

Information systems success measurement
Measuring information success at the individual level in cross-cultural environments

Information systems success measurement
A conceptual development of process and outcome user satisfaction

Information systems success measurement
The impact of developer responsiveness on perceptions of usefulness and ease of use: an extension of the technology acceptance model

ACM SIGMIS Database
Evolving a new theory of project success

Communications of the ACM
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Understanding the link between IT project manager skills and project success research in progress

SIGCPR '00 Proceedings of the 2000 ACM SIGCPR conference on Computer personnel research
A replicated assessment and comparison of common software cost modeling techniques

Proceedings of the 22nd international conference on Software engineering
Software developer perceptions about software project failure: a case study

Journal of Systems and Software - Special issue on software engineering education and training for the next millennium
Rapid Development: Taming Wild Software Schedules

Rapid Development: Taming Wild Software Schedules
Software Engineering Economics

Software Engineering Economics
The Mythical Man-Month: Essays on Softw

The Mythical Man-Month: Essays on Softw
Assessing Project Success Using Subjective Evaluation Factors

Software Quality Control
Patterns of Large Software Systems: Failure and Success

Computer
Fear of Trying: The Plight of Rookie Project Managers

IEEE Software
Controlling Overfitting in Software Quality Models: Experiments with Regression Trees and Classification

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
A Survey on Software Estimation in the Norwegian Industry

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
In-House Software Development: What Project Management Practices Lead to Success?

IEEE Software
Australian Software Development: What Software Project Management Practices Lead to Success?

ASWEC '05 Proceedings of the 2005 Australian conference on Software Engineering
What do software practitioners really think about project success: an exploratory study

Journal of Systems and Software
Estimation of project success using Bayesian classifier

Proceedings of the 28th international conference on Software engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Predicting good requirements for in-house development projects

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
State of the practice: An exploratory analysis of schedule estimation and software project success prediction

Information and Software Technology
Project Outcome Predictions: Risk Barometer Based on Historical Data

ICGSE '07 Proceedings of the International Conference on Global Software Engineering
Missing Data Imputation Techniques

International Journal of Business Intelligence and Data Mining
What do software practitioners really think about project success: A cross-cultural comparison

Journal of Systems and Software
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Why did your project fail?

Communications of the ACM - Finding the Fun in Computer Science Education
How large are software cost overruns? A review of the 1994 CHAOS report

Information and Software Technology
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Predicting software development project outcomes

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Evaluation of three methods to predict project success: a case study

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement

The optimization of success probability for software projects using genetic algorithms

Journal of Systems and Software
Perceived causes of software project failures - An analysis of their relationships

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Software has been developed since the 1960s but the success rate of software development projects is still low. During the development of software, the probability of success is affected by various practices or aspects. To date, it is not clear which of these aspects are more important in influencing project outcome. Objective: In this research, we identify aspects which could influence project success, build prediction models based on the aspects using data collected from multiple companies, and then test their performance on data from a single organization. Method: A survey-based empirical investigation was used to examine variables and factors that contribute to project outcome. Variables that were highly correlated to project success were selected and the set of variables was reduced to three factors by using principal components analysis. A logistic regression model was built for both the set of variables and the set of factors, using heterogeneous data collected from two different countries and a variety of organizations. We tested these models by using a homogeneous hold-out dataset from one organization. We used the receiver operating characteristic (ROC) analysis to compare the performance of the variable and factor-based models when applied to the homogeneous dataset. Results: We found that using raw variables or factors in the logistic regression models did not make any significant difference in predictive capability. The prediction accuracy of these models is more balanced when the cut-off is set to the ratio of success to failures in the datasets used to build the models. We found that the raw variable and factor-based models predict significantly better than random chance. Conclusion: We conclude that an organization wishing to estimate whether a project will succeed or fail may use a model created from heterogeneous data derived from multiple organizations.