A Procedure for Analyzing Unbalanced Datasets
IEEE Transactions on Software Engineering
Calibrating the COCOMO II post-architecture model
Proceedings of the 20th international conference on Software engineering
Bayesian Analysis of Empirical Software Engineering Cost Models
IEEE Transactions on Software Engineering
Software Engineering Economics
Software Engineering Economics
Software Cost Estimation with Cocomo II with Cdrom
Software Cost Estimation with Cocomo II with Cdrom
Preliminary Data Analysis Methods in Software Estimation
Software Quality Control
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
Hi-index | 0.00 |
As a standard practice, general effort estimate models are calibrated from large cross-company datasets. However, many of the records within such datasets are taken from companies that have calibrated the model to match their own local practices. Locally calibrated models are a double-edged sword; they often improve estimate accuracy for that particular organization, but they also encourage the growth of local biases. Such biases remain present when projects from that firm are used in a new cross-company dataset. Over time, such biases compound, and the reliability and accuracy of a general model derived from the data will be affected by the increased level of heterogeneity. In this paper, we propose a statistical measure of the exact level of heterogeneity of a cross-company dataset. In experimental tests, we measure the heterogeneity of two COCOMO-based datasets and demonstrate that one is more homogeneous than the other. Such a measure has potentially important implications for both model maintainers and model users. Furthermore, a heterogeneity measure can be used to inform users of the appropriate data handling techniques.