Discretization methods for NBC in effort estimation: an empirical comparison based on ISBSG projects

Authors:
Marta Fernández-Diego;José-María Torralba-Martínez
Affiliations:
Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain
Venue:
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Year:
2012

Citing 20
Cited 0

Predicting project delivery rates using the Naive-Bayes classifier

Journal of Software Maintenance: Research and Practice
Why Discretization Works for Naive Bayesian Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Making Resource Decisions for Software Projects

Proceedings of the 26th International Conference on Software Engineering
A Probabilistic Model for Predicting Software Development Effort

IEEE Transactions on Software Engineering
Software Project Management Using Decision Networks

ISDA '06 Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 02
Software Project Level Estimation Model Framework based on Bayesian Belief Networks

QSIC '06 Proceedings of the Sixth International Conference on Quality Software
Data Mining

Data Mining
Inference in hybrid Bayesian networks using dynamic discretization

Statistics and Computing
A Comparison of Techniques for Web Effort Estimation

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
An empirical analysis of software effort estimation with outlier elimination

Proceedings of the 4th international workshop on Predictor models in software engineering
The Use of Bayesian Networks for Web Effort Estimation: Further Investigation

ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Bayesian Network Models for Web Effort Prediction: A Comparative Study

IEEE Transactions on Software Engineering
Predicting Project Velocity in XP Using a Learning Dynamic Bayesian Network Model

IEEE Transactions on Software Engineering
Improved decision-making for software managers using Bayesian networks

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
BBN based approach for improving the software development process of an SME—a case study

Journal of Software Maintenance and Evolution: Research and Practice
Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Nonuniform dynamic discretization in hybrid networks

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Numbers in multi-relational data mining

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Predicting web development effort using a bayesian network

EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Software Effort Estimation Using NBC and SWR: A Comparison Based on ISBSG Projects

IWSM-MENSURA '12 Proceedings of the 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Bayesian networks have been applied in many fields, including effort estimation in software engineering. Even though there are Bayesian inference algorithms than can handle continuous variables, performance tends to be better when these variables are discretized that when they are assumed to follow a specific distribution. On the other hand, the choice of the discretization method and the number of discretized intervals may lead to significantly different estimating results. However, discretization issues are seldom mentioned in software engineering effort estimation models. Aim: This paper seeks to show that discretization issues are important in terms of prediction accuracy while building a Naive Bayes Classifier (NBC) for estimating software effort. Method: For this purpose, a NBC model has been developed for software effort estimation based on ISBSG projects applying different discretization schemes (equal width intervals, equal frequency intervals, and k-means clustering) and using different number of intervals. Results: Regarding the NBC model built, the estimation accuracy of equal frequency discretization is only improved by k-means clustering with respect to Pred(0.25), although it reflects better the original distribution. Conclusions: Further experimentation should determine the potential of clustering methods already highlighted in other fields.