Analysis of Naive Bayes' assumptions on software fault data: An empirical study

Authors:
Burak Turhan;Ayse Bener
Affiliations:
Department of Computer Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey;Department of Computer Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 24
Cited 8

Regression modelling of software quality: empirical investigation

Journal of Electronic Materials
The Detection of Fault-Prone Programs

IEEE Transactions on Software Engineering
C4.5: programs for machine learning

C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Lazy Learning of Bayesian Rules

Machine Learning
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Lessons learned from 25 years of process improvement: the rise and fall of the NASA software engineering laboratory

Proceedings of the 24th International Conference on Software Engineering
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Requirement-Based Automated Black-Box Test Generation

COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
Using Machine Learning for Estimating the Defect Content After an Inspection

IEEE Transactions on Software Engineering
Learning Weighted Naive Bayes with Accurate Ranking

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Software Defect Association Mining and Defect Correction Effort Prediction

IEEE Transactions on Software Engineering
Optimal Project Feature Weights in Analogy-Based Cost Estimation: Improvement and Limitations

IEEE Transactions on Software Engineering
A decision tree-based attribute weighting filter for naive Bayes

Knowledge-Based Systems
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
A Multivariate Analysis of Static Code Attributes for Defect Prediction

QSIC '07 Proceedings of the Seventh International Conference on Quality Software
On the difficulty of replicating human subjects studies in software engineering

Proceedings of the 30th international conference on Software engineering
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Validation of network measures as indicators of defective modules in software systems

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Editorial: Acquiring knowledge from inconsistent data sources through weighting

Data & Knowledge Engineering
Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Information and Software Technology
Reinforcement learning based resource allocation in business process management

Data & Knowledge Engineering
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
Searching for rules to detect defective modules: A subgroup discovery approach

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software defect prediction is important for reducing test times by allocating testing resources effectively. In terms of predicting the defects in software, Naive Bayes outperforms a wide range of other methods. However, Naive Bayes assumes the 'independence' and 'equal importance' of attributes. In this work, we analyze these assumptions of Naive Bayes using public software defect data from NASA. Our analysis shows that independence assumption is not harmful for software defect data with PCA pre-processing. Our results also indicate that assigning weights to static code attributes may increase the prediction performance significantly, while removing the need for feature subset selection.