Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions

Authors:
Chris Lokan;Emilia Mendes
Affiliations:
School of IT&EE, UNSW@ADFA, Canberra, Australia;Computer Science Department, The University of Auckland, Auckland, New Zealand
Venue:
EASE'08 Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering
Year:
2008

Citing 18
Cited 3

Software engineering metrics and models

Software engineering metrics and models
A Procedure for Analyzing Unbalanced Datasets

IEEE Transactions on Software Engineering
An assessment and comparison of common software cost estimation modeling techniques

Proceedings of the 21st international conference on Software engineering
A replicated assessment and comparison of common software cost modeling techniques

Proceedings of the 22nd international conference on Software engineering
Using Public Domain Metrics To Estimate Software Development Effort

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
How Valuable is company-specific Data Compared to multi-company Data for Software Cost Estimation?

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
A Replicated Assessment of the Use of Adaptation Rules to Improve Web Cost Estimation

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Using Prior-Phase Effort Records for Re-estimation During Software Projects

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Further Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Replicated Comparison of Cross-Company and Within-Company Effort Estimation Models Using the ISBSG Database

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
An Empirical Analysis of Software Productivity over Time

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Cross-company and single-company effort models using the ISBSG database: a further replicated study

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Effort Prediction in Iterative Software Development Processes -- Incremental Versus Global Prediction Models

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Building Software Cost Estimation Models using Homogenous Data

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Replicating studies on cross- vs single-company effort models using the ISBSG Database

Empirical Software Engineering
Cross-company vs. single-company web effort models using the Tukutuku database: An extended study

Journal of Systems and Software
Using genetic programming to improve software effort estimation based on general data sets

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII

Applying moving windows to software effort estimation

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Using chronological splitting to compare cross- and single-company effort models: further investigation

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Can cross-company data improve performance in software effort estimation?

Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

CONTEXT: Numerous studies have investigated the use of cross-company datasets to estimate effort for single-company projects; however to date only one has compared the effect of using a chronological split instead of a random split to assign projects to a training set and a validation set, finding no significant differences. OBJECTIVE: The aim of this study is to extend [15] using a project-by-project chronological split, and also to investigate how this type of split affects the results when comparing within- to cross-company effort estimation. METHOD: Chronological splitting was compared with two forms of cross-validation. Here a more realistic form of chronological splitting than the one used in [15] is investigated, in which a validation set contains a single project, and a regression model is built from scratch using as training set the set of projects completed before the validation project's start date. We used 228 single-company projects and 678 cross-company projects from the ISBSG Release 10 repository. RESULTS: We obtained contradictory results when comparing cross- to single-company predictions for single-company projects. First, when results were compared using absolute residuals there were no differences between cross- and single-company predictions, or between techniques. However, when using z values, chronological splitting favoured cross-company models, and cross-validation (both types) favoured single-company models. CONCLUSIONS: Results were promising when using project-by-project splitting because: i) they favoured cross-company models; and ii) this type of splitting mimics an effort estimation scenario in a real environment. However, these results were obtained using z values only. Therefore we urge future studies comparing prediction models to document results obtained using both z values and absolute residuals, such that a full picture can be provided.