Discretization methods for NBC in effort estimation: an empirical comparison based on ISBSG projects

  • Authors:
  • Marta Fernández-Diego;José-María Torralba-Martínez

  • Affiliations:
  • Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain

  • Venue:
  • Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Bayesian networks have been applied in many fields, including effort estimation in software engineering. Even though there are Bayesian inference algorithms than can handle continuous variables, performance tends to be better when these variables are discretized that when they are assumed to follow a specific distribution. On the other hand, the choice of the discretization method and the number of discretized intervals may lead to significantly different estimating results. However, discretization issues are seldom mentioned in software engineering effort estimation models. Aim: This paper seeks to show that discretization issues are important in terms of prediction accuracy while building a Naive Bayes Classifier (NBC) for estimating software effort. Method: For this purpose, a NBC model has been developed for software effort estimation based on ISBSG projects applying different discretization schemes (equal width intervals, equal frequency intervals, and k-means clustering) and using different number of intervals. Results: Regarding the NBC model built, the estimation accuracy of equal frequency discretization is only improved by k-means clustering with respect to Pred(0.25), although it reflects better the original distribution. Conclusions: Further experimentation should determine the potential of clustering methods already highlighted in other fields.