Preliminary Data Analysis Methods in Software Estimation

  • Authors:
  • Qin Liu;Robert C. Mintram

  • Affiliations:
  • School of Informatics, University of Northumbria, UK;School of Informatics, University of Northumbria, UK

  • Venue:
  • Software Quality Control
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software is quite often expensive to develop and can become a major cost factor in corporate information systems驴 budgets. With the variability of software characteristics and the continual emergence of new technologies the accurate prediction of software development costs is a critical problem within the project management context. In order to address this issue a large number of software cost prediction models have been proposed. Each model succeeds to some extent but they all encounter the same problem, i.e., the inconsistency and inadequacy of the historical data sets. Often a preliminary data analysis has not been performed and it is possible for the data to contain non-dominated or confounded variables. Moreover, some of the project attributes or their values are inappropriately out of date, for example the type of computer used for project development in the COCOMO 81 (Boehm, 1981) data set. This paper proposes a framework composed of a set of clearly identified steps that should be performed before a data set is used within a cost estimation model. This framework is based closely on a paradigm proposed by Maxwell (2002). Briefly, the framework applies a set of statistical approaches, that includes correlation coefficient analysis, Analysis of Variance and Chi-Square test, etc., to the data set in order to remove outliers and identify dominant variables. To ground the framework within a practical context the procedure is used to analyze the ISBSG (International Software Benchmarking Standards Group data--Release 8) data set. This is a frequently used accessible data collection containing information for 2,008 software projects. As a consequence of this analysis, 6 explanatory variables are extracted and evaluated.