Collecting and categorizing software error data in an industrial environment
Journal of Systems and Software - Special issue on the fifth Minnowbrook workshop on software performance evaluation
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
The distribution of faults in a large industrial software system
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Quantitative Analysis of Faults and Failures in a Complex Software System
IEEE Transactions on Software Engineering
Proceedings of the 17th IEEE international conference on Automated software engineering
What We Have Learned About Fighting Defects
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Static analysis tools as early indicators of pre-release defect density
Proceedings of the 27th international conference on Software engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems
IEEE Transactions on Software Engineering
Predicting Defects for Eclipse
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Journal of Software Maintenance and Evolution: Research and Practice
Mining software repositories for comprehensible software fault prediction models
Journal of Systems and Software
Implications of ceiling effects in defect predictors
Proceedings of the 4th international workshop on Predictor models in software engineering
Optimization of the Area under the ROC Curve
SBRN '08 Proceedings of the 2008 10th Brazilian Symposium on Neural Networks
Predicting Fault Proneness of Classes Trough a Multiobjective Particle Swarm Optimization Algorithm
ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Hi-index | 0.00 |
Utilization of data mining in software engineering has been the subject of several research papers. Majority of subjects of those paper were in making use of historical data for decision making activities such as cost estimation and product or project attributes prediction and estimation. The ability to predict software fault modules and the ability to correlate relations between faulty modules and product attributes using statistics is the subject of this paper. Correlations and relations between the attributes and the categorical variable or the class are studied through generating a pool of records from each dataset and then select two samples every time from the dataset and compare them. The correlation between the two selected records is studied in terms of changing from faulty to non-faulty or the opposite for the module defect attribute and the value change between the two records in each evaluated attribute (e.g. equal, larger or smaller). The goal was to study if there are certain attributes that are consistently affecting changing the state of the module from faulty to none, or the opposite. Results indicated that such technique can be very useful in studying the correlations between each attribute and the defect status attribute. Another prediction algorithm is developed based on statistics of the module and the overall dataset. The algorithm gave each attribute true class and faulty class predictions. We found that dividing prediction capability for each attribute into those two (i.e. correct and faulty module prediction) facilitate understanding the impact of attribute values on the class and hence improve the overall prediction relative to previous studies and data mining algorithms. Results were evaluated and compared with other algorithms and previous studies. ROC metrics were used to evaluate the performance of the developed metrics. Results from those metrics showed that accuracy or prediction performance calculated traditionally using accurately predicted records divided by the total number of records in the dataset does not necessarily give the best indicator of a good metric or algorithm predictability. Those predictions may give wrong implication if other metrics are not considered with them. The ROC metrics were able to show some other important aspects of performance or accuracy.