Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Maximum Consistency of Incomplete Datavia Non-Invasive Imputation
Artificial Intelligence Review
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions
Empirical Software Engineering
A similarity model for detection of conflicts between overlapping STEP application protocols
International Journal of Computer Applications in Technology
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Understanding the importance of roles in architecture-related process improvement: a case study
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
A study of the non-linear adjustment for analogy based software cost estimation
Empirical Software Engineering
Methodologies for model-free data interpretation of civil engineering structures
Computers and Structures
Handling incomplete data using evolution of imputation methods
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Adaptive ridge regression system for software cost estimating on multi-collinear datasets
Journal of Systems and Software
Journal of Intelligent Manufacturing
Hi-index | 0.00 |
Missing data are common in surveys regardless of research field, undermining statistical analyses and biasing results. One solution is to use an imputation method, which recovers missing data by estimating replacement values. Previously, we have evaluated the hot-deck k-Nearest Neighbour (k-NN) method with Likert data in a software engineering context. In this paper, we extend the evaluation by benchmarking the method against four other imputation methods: Random Draw Substitution, Random Imputation, Median Imputation and Mode Imputation. By simulating both non-response and imputation, we obtain comparable performance measures for all methods. We discuss the performance of k-NN in the light of the other methods, but also for different values of k, different proportions of missing data, different neighbour selection strategies and different numbers of data attributes. Our results show that the k-NN method performs well, even when much data are missing, but has strong competition from both Median Imputation and Mode Imputation for our particular data. However, unlike these methods, k-NN has better performance with more data attributes. We suggest that a suitable value of k is approximately the square root of the number of complete cases, and that letting certain incomplete cases qualify as neighbours boosts the imputation ability of the method.