Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Fault Prediction using Early Lifecycle Data
ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
Revisiting the evaluation of defect prediction models
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Misclassification cost-sensitive fault prediction models
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Finding feature transformation functions using genetic algorithm
Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Empirical evaluation of reliability improvement in an evolving software product line
Proceedings of the 8th Working Conference on Mining Software Repositories
An industrial case study of classifier ensembles for locating software defects
Software Quality Control
Are change metrics good predictors for an evolving software product line?
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Data preprocessing (transformation) plays an important role in data mining and machine learning. In this study, we investigate the effect of four different preprocessing methods to fault-proneness prediction using nine datasets from NASA Metrics Data Programs (MDP) and ten classification algorithms. Our experiments indicate that log transformation rarely improves classification performance, but discretization affects the performance of many different algorithms. The impact of different transformations differs. Random forest algorithm, for example, performs better with original and log transformed data set. Boosting and NaiveBayes perform significantly better with discretized data. We conclude that no general benefit can be expected from data transformations. Instead, selected transformation techniques are recommended to boost the performance of specific classification algorithms.