Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data

Authors:
John Moses;Malcolm Farrow
Affiliations:
School of Computing and Technology, University of Sunderland, UK SR6 0DD;School of Computing and Technology, University of Sunderland, UK SR6 0DD
Venue:
Software Quality Control
Year:
2005

Citing 13
Cited 6

Statistical analysis with missing data

Statistical analysis with missing data
Software sizing and estimating: Mk II FPA (Function Point Analysis)

Software sizing and estimating: Mk II FPA (Function Point Analysis)
Empirical studies of assumptions that underlie software cost-estimation models

Information and Software Technology
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software Engineering Economics

Software Engineering Economics
Software Development Cost Estimation Using Function Points

IEEE Transactions on Software Engineering
A Further Empirical Investigation of the Relationship Between MRE and Project Size

Empirical Software Engineering
A Consideration of the Impact of Interactions with Module Effects on the Direct Measurement of Subjective Software Attributes

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Building A Software Cost Estimation Model Based On Categorical Data

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Dealing with Missing Software Project Data

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
A Procedure for Assessing the Influence of Problem Domain on Effort Estimation Consistency

Software Quality Control
Practical Statistics for Medical Research

Practical Statistics for Medical Research

A productivity benchmarking case study using Bayesian credible intervals

Software Quality Control
Using industry based data sets in software engineering research

Proceedings of the 2006 international workshop on Summit on software engineering education
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data

Journal of Systems and Software
Tests for consistent measurement of external subjective software quality attributes

Empirical Software Engineering
Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Software Quality Control
Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset

Proceedings of the 6th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study the authors analyse the International Software Benchmarking Standards Group data repository, Release 8.0. The data repository comprises project data from several different companies. However, the repository exhibits missing data, which must be handled in an appropriate manner, otherwise inferences may be made that are biased and misleading. The authors re-examine a statistical model that explained about 62% of the variability in actual software development effort (Summary Work Effort) which was conditioned on a sample from the repository of 339 observations. This model exhibited covariates Adjusted Function Points and Maximum Team Size and dependence on Language Type (which includes categories 2nd, 3rd, 4th Generation Languages and Application Program Generators) and Development Type (enhancement, new development and re-development). The authors now use Bayesian inference and the Bayesian statistical simulation program, BUGS, to impute missing data avoiding deletion of observations with missing Maximum Team size and increasing sample size to 616. Providing that by imputing data distributional biases are not introduced, the accuracy of inferences made from models that fit the data will increase. As a consequence of imputation, models that fit the data and explain about 59% of the variability in actual effort are identified. These models enable new inferences to be made about Language Type and Development Type. The sensitivity of the inferences to alternative distributions for imputing missing data is also considered. Furthermore, the authors contemplate the impact of these distributions on the explained variability of actual effort and show how valid effort estimates can be derived to improve estimate consistency.