Replicating studies on cross- vs single-company effort models using the ISBSG Database

Authors:
Emilia Mendes;Chris Lokan
Affiliations:
Computer Science Department, University of Auckland, Auckland, New Zealand Private Bag 92019;School of IT & EE, UNSW@ADFA, Canberra, Australia ACT 2600
Venue:
Empirical Software Engineering
Year:
2008

Citing 0
Cited 15

Data sets and data quality in software engineering

Proceedings of the 4th international workshop on Predictor models in software engineering
A comparative evaluation on the accuracies of software effort estimates from clustered data

Information and Software Technology
Software development productivity of Japanese enterprise applications

Information Technology and Management
LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

Empirical Software Engineering
Using chronological splitting to compare cross- and single-company effort models: further investigation

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Modeling the relationship between software effort and size using deming regression

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
State of the practice in software effort estimation: a survey and literature review

CEE-SET'08 Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques
Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study

EASE'09 Proceedings of the 13th international conference on Evaluation and Assessment in Software Engineering
Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions

EASE'08 Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering
Empirical analysis of the impact of requirements engineering on software quality

REFSQ'12 Proceedings of the 18th international conference on Requirements Engineering: foundation for software quality
Alternative methods using similarities in software effort estimation

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
A study on predictive performance of regression-based effort estimation models using base functional components

PROFES'12 Proceedings of the 13th international conference on Product-Focused Software Process Improvement
Software Engineering Productivity: Concepts, Issues and Challenges

International Journal of Information Technology Project Management
On the value of outlier elimination on software effort estimation research

Empirical Software Engineering
AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In 2001 the ISBSG database was used by Jeffery et al. (Using public domain metrics to estimate software development effort. Proceedings Metrics'01, London, pp 16---27, 2001; S1) to compare the effort prediction accuracy between cross- and single-company effort models. Given that more than 2,000 projects were later volunteered to this database, in 2005 Mendes et al. (A replicated comparison of cross-company and within-company effort estimation models using the ISBSG Database, in Proceedings of Metrics'05, Como, 2005; S2) replicated S1 but obtained different results. The difference in results could have occurred due to legitimate differences in data set patterns; however, they could also have occurred due to differences in experimental procedure given that S2 was unable to employ exactly the same experimental procedure used in S1 because S1's procedure was not fully documented. Recently, we applied S2's experimental procedure to the ISBSG database version used in S1 (release 6) to assess if differences in experimental procedure would have contributed towards different results (Lokan and Mendes, Cross-company and single-company effort models using the ISBSG Database: a further replicated study, Proceedings of the ISESE'06, pp 75---84, 2006; S3). Our results corroborated those from S1, suggesting that differences in the results obtained by S2 were likely caused by legitimate differences in data set patterns. We have since been able to reconstruct the experimental procedure of S1 and therefore in this paper we present both S3 and also another study (S4), which applied the experimental procedure of S1 to the data set used in S2. By applying the experimental procedure of S2 to the data set used in S1 (study S3), and the experimental procedure of S1 to the data set used in S2 (study S4), we investigate the effect of all the variations between S1 and S2. Our results for S4 support those of S3, suggesting that differences in data preparation and analysis procedures did not affect the outcome of the analysis. Thus, the different results of S1 and S2 are very likely due to fundamental differences in the data sets.