Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study

Authors:
Emilia Mendes;Chris Lokan
Affiliations:
Computer Science Department, The University of Auckland, Auckland, New Zealand;School of IT&EE, UNSW@ADFA, Canberra, Australia
Venue:
EASE'09 Proceedings of the 13th international conference on Evaluation and Assessment in Software Engineering
Year:
2009

Citing 23
Cited 3

Software engineering metrics and models

Software engineering metrics and models
A Procedure for Analyzing Unbalanced Datasets

IEEE Transactions on Software Engineering
An assessment and comparison of common software cost estimation modeling techniques

Proceedings of the 21st international conference on Software engineering
A replicated assessment and comparison of common software cost modeling techniques

Proceedings of the 22nd international conference on Software engineering
An empirical study of maintenance and development estimation accuracy

Journal of Systems and Software
Using Public Domain Metrics To Estimate Software Development Effort

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
How Valuable is company-specific Data Compared to multi-company Data for Software Cost Estimation?

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
A Replicated Assessment of the Use of Adaptation Rules to Improve Web Cost Estimation

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Using Prior-Phase Effort Records for Re-estimation During Software Projects

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Increasing the Accuracy and Reliability of Analogy-Based Cost Estimation with Extensive Project Feature Dimension Weighting

ISESE '04 Proceedings of the 2004 International Symposium on Empirical Software Engineering
Further Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Replicated Comparison of Cross-Company and Within-Company Effort Estimation Models Using the ISBSG Database

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
An Empirical Analysis of Software Productivity over Time

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Optimal Project Feature Weights in Analogy-Based Cost Estimation: Improvement and Limitations

IEEE Transactions on Software Engineering
Cross-company and single-company effort models using the ISBSG database: a further replicated study

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Effort Prediction in Iterative Software Development Processes -- Incremental Versus Global Prediction Models

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Building Software Cost Estimation Models using Homogenous Data

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Comparing Local and Global Software Effort Estimation Models -- Reflections on a Systematic Review

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Replicating studies on cross- vs single-company effort models using the ISBSG Database

Empirical Software Engineering
Cross-company vs. single-company web effort models using the Tukutuku database: An extended study

Journal of Systems and Software
Using genetic programming to improve software effort estimation based on general data sets

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Using chronological splitting to compare cross- and single-company effort models: further investigation

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91

Applying moving windows to software effort estimation

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Data accumulation and software effort prediction

Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Can cross-company data improve performance in software effort estimation?

Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

CONTEXT: Three previous studies have investigated the use of chronological split to compare cross- to single-company effort predictions, where all used the ISBSG dataset release 10. Therefore there is a need for these studies to be replicated using different datasets such that the patterns previously observed can be compared and contrasted, and a better understanding with regard to the use of chronological splitting can be reached. OBJECTIVE: The aim of this study is to replicate [17] using the same chronological splitting; however a different database - the Finnish dataset. METHOD: Chronological splitting was compared with two forms of cross-validation. The chronological splitting used was the project-by-project chronological split, in which a validation set contains a single project, and a regression model is built from scratch using as training set the set of projects completed before the validation project's start date. We used 201 single-company projects and 593 cross-company projects from the Finnish dataset. RESULTS: Single-company models presented significantly better prediction than cross-company models. Chronological splitting provided significantly worse accuracy than leave-one and leave-two out cross-validations when based on single-company data; and provided similar accuracy when based on cross-company data. CONCLUSIONS: Results did not seem promising when using project-by-project splitting; however in a real scenario companies that use their own data can only apply some sort of chronological splitting when obtaining effort estimates for their new projects. Therefore we urge the use of chronological splitting in effort estimation studies such that more realistic results can be provided to inform industry.