C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Validating the ISO/IEC 15504 measures of software development process capability
Journal of Systems and Software
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Imputation of Missing Data in Industrial Databases
Applied Intelligence
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions
Empirical Software Engineering
Ensemble Imputation Methods for Missing Software Engineering Data
METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
An industrial case study of classifier ensembles for locating software defects
Software Quality Control
An algorithmic approach to missing data problem in modeling human aspects in software development
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Software engineers are commonly faced with the problem of incomplete data. Incomplete data can reduce system performance in terms of predictive accuracy. Unfortunately, rare research has been conducted to systematically explore the impact of missing values, especially from the missing data handling point of view. This has made various missing data techniques (MDTs) less significant. This paper describes a systematic comparison of seven MDTs using eight industrial datasets. Our findings from an empirical evaluation suggest listwise deletion as the least effective technique for handling incomplete data while multiple imputation achieves the highest accuracy rates. We further propose and show how a combination of MDTs by randomizing a decision tree building algorithm leads to a significant improvement in prediction performance for missing values up to 50%.