Can data transformation help in the detection of fault-prone modules?

Authors:
Yue Jiang;Bojan Cukic;Tim Menzies
Affiliations:
West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV
Venue:
DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Year:
2008

Citing 6
Cited 7

Random Forests

Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Fault Prediction using Early Lifecycle Data

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability

Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Finding feature transformation functions using genetic algorithm

Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Different strokes for different folks: a case study on software metrics for different defect categories

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Empirical evaluation of reliability improvement in an evolving software product line

Proceedings of the 8th Working Conference on Mining Software Repositories
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
Are change metrics good predictors for an evolving software product line?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data preprocessing (transformation) plays an important role in data mining and machine learning. In this study, we investigate the effect of four different preprocessing methods to fault-proneness prediction using nine datasets from NASA Metrics Data Programs (MDP) and ten classification algorithms. Our experiments indicate that log transformation rarely improves classification performance, but discretization affects the performance of many different algorithms. The impact of different transformations differs. Random forest algorithm, for example, performs better with original and log transformed data set. Boosting and NaiveBayes perform significantly better with discretized data. We conclude that no general benefit can be expected from data transformations. Instead, selected transformation techniques are recommended to boost the performance of specific classification algorithms.