Evaluating the change of software fault behavior with dataset attributes based on categorical correlation

Authors:
Izzat Alsmadi;Hassan Najadat
Affiliations:
Yarmouk University, Faculty of CS and IT, Jordan;Jordan University of Science and Technology, Computer and IT Faculty, Jordan
Venue:
Advances in Engineering Software
Year:
2011

Citing 21
Cited 0

Collecting and categorizing software error data in an industrial environment

Journal of Systems and Software - Special issue on the fifth Minnowbrook workshop on software performance evaluation
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
The distribution of faults in a large industrial software system

ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
Model-Based Tests of Truisms

Proceedings of the 17th IEEE international conference on Automated software engineering
What We Have Learned About Fighting Defects

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Metrics That Matter

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Static analysis tools as early indicators of pre-release defect density

Proceedings of the 27th international conference on Software engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems

IEEE Transactions on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
A survey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice
Mining software repositories for comprehensible software fault prediction models

Journal of Systems and Software
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Optimization of the Area under the ROC Curve

SBRN '08 Proceedings of the 2008 10th Brazilian Symposium on Neural Networks
Predicting Fault Proneness of Classes Trough a Multiobjective Particle Swarm Optimization Algorithm

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Utilization of data mining in software engineering has been the subject of several research papers. Majority of subjects of those paper were in making use of historical data for decision making activities such as cost estimation and product or project attributes prediction and estimation. The ability to predict software fault modules and the ability to correlate relations between faulty modules and product attributes using statistics is the subject of this paper. Correlations and relations between the attributes and the categorical variable or the class are studied through generating a pool of records from each dataset and then select two samples every time from the dataset and compare them. The correlation between the two selected records is studied in terms of changing from faulty to non-faulty or the opposite for the module defect attribute and the value change between the two records in each evaluated attribute (e.g. equal, larger or smaller). The goal was to study if there are certain attributes that are consistently affecting changing the state of the module from faulty to none, or the opposite. Results indicated that such technique can be very useful in studying the correlations between each attribute and the defect status attribute. Another prediction algorithm is developed based on statistics of the module and the overall dataset. The algorithm gave each attribute true class and faulty class predictions. We found that dividing prediction capability for each attribute into those two (i.e. correct and faulty module prediction) facilitate understanding the impact of attribute values on the class and hence improve the overall prediction relative to previous studies and data mining algorithms. Results were evaluated and compared with other algorithms and previous studies. ROC metrics were used to evaluate the performance of the developed metrics. Results from those metrics showed that accuracy or prediction performance calculated traditionally using accurately predicted records divided by the total number of records in the dataset does not necessarily give the best indicator of a good metric or algorithm predictability. Those predictions may give wrong implication if other metrics are not considered with them. The ROC metrics were able to show some other important aspects of performance or accuracy.