Segmentation of software engineering datasets using the m5 algorithm

  • Authors:
  • D. Rodríguez;J. J. Cuadrado;M. A. Sicilia;R. Ruiz

  • Affiliations:
  • The University of Reading, Reading, UK;The University of Alcalá, Alcalá de Henares (Madrid), Spain;The University of Alcalá, Alcalá de Henares (Madrid), Spain;The University of Seville, Sevilla, Spain

  • Venue:
  • ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports an empirical study that uses clustering techniques to derive segmented models from software engineering repositories, focusing on the improvement of the accuracy of estimates. In particular, we used two datasets obtained from the International Software Benchmarking Standards Group (ISBSG) repository and created clusters using the M5 algorithm. Each cluster is associated with a linear model. We then compare the accuracy of the estimates so generated with the classical multivariate linear regression and least median squares. Results show that there is an improvement in the accuracy of the results when using clustering. Furthermore, these techniques can help us to understand the datasets better; such techniques provide some advantages to project managers while keeping the estimation process within reasonable complexity.