Maximising data retention from the ISBSG repository

Authors:
Kefu Deng;Stephen G. MacDonell
Affiliations:
School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand;School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand
Venue:
EASE'08 Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering
Year:
2008

Citing 11
Cited 1

An assessment and comparison of common software cost estimation modeling techniques

Proceedings of the 21st international conference on Software engineering
Using Public Domain Metrics To Estimate Software Development Effort

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Replicated Comparison of Cross-Company and Within-Company Effort Estimation Models Using the ISBSG Database

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
An empirical study of process-related attributes in segmented software cost-estimation relationships

Journal of Systems and Software
Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Impact Analysis of Missing Values on the Prediction Accuracy of Analogy-based Software Effort Estimation Method AQUA

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Comparing Local and Global Software Effort Estimation Models -- Reflections on a Systematic Review

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
An empirical study of the Cobb-Douglas production function properties of software development effort

Information and Software Technology
A new regression based software cost estimation model using power values

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
A systematic review of cross- vs. within- company cost estimation studies

EASE'06 Proceedings of the 10th international conference on Evaluation and Assessment in Software Engineering

Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset

Proceedings of the 6th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

BACKGROUND: In 1997 the International Software Benchmarking Standards Group (ISBSG) began to collect data on software projects. Since then they have provided copies of their repository to researchers and practitioners, through a sequence of releases of increasing size. PROBLEM: Questions over the quality and completeness of the data in the repository have led some researchers to discard substantial proportions of the data in terms of observations, and to discount the use of some variables in the modelling of, among other things, software development effort. In some cases the details of the discarding of data has received little mention and minimal justification. METHOD: We describe the process we used in attempting to maximise the amount of data retained for modelling software development effort at the project level, based on previously completed projects that had been sized using IFPUG/NESMA function point analysis (FPA) and recorded in the repository. RESULTS: Through justified formalisation of the data set and domain-informed refinement we arrive at a final usable data set comprising 2862 (of 3024) observations across thirteen variables. CONCLUSION: a methodical approach to the pre-processing of data can help to ensure that as much data is retained for modelling as possible. Assuming that the data does reflect one or more underlying models, such retention should increase the likelihood of robust models being developed.