Measuring the heterogeneity of cross-company dataset

  • Authors:
  • Jia Chen;Ye Yang;Wen Zhang;Gregory Gay

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;West Virginia University, Morgantown, WV

  • Venue:
  • Proceedings of the 11th International Conference on Product Focused Software
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a standard practice, general effort estimate models are calibrated from large cross-company datasets. However, many of the records within such datasets are taken from companies that have calibrated the model to match their own local practices. Locally calibrated models are a double-edged sword; they often improve estimate accuracy for that particular organization, but they also encourage the growth of local biases. Such biases remain present when projects from that firm are used in a new cross-company dataset. Over time, such biases compound, and the reliability and accuracy of a general model derived from the data will be affected by the increased level of heterogeneity. In this paper, we propose a statistical measure of the exact level of heterogeneity of a cross-company dataset. In experimental tests, we measure the heterogeneity of two COCOMO-based datasets and demonstrate that one is more homogeneous than the other. Such a measure has potentially important implications for both model maintainers and model users. Furthermore, a heterogeneity measure can be used to inform users of the appropriate data handling techniques.