Incomplete-case nearest neighbor imputation in software measurement data

Authors:
Jason Van Hulse;Taghi M. Khoshgoftaar
Affiliations:
Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA;Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 15
Cited 1

Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Dealing with Missing Software Project Data

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Analyzing Software Measurement Data with Clustering Techniques

IEEE Intelligent Systems
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions

Empirical Software Engineering
Nearest neighbour approach in the least-squares data imputation algorithms

Information Sciences: an International Journal
Ensemble Imputation Methods for Missing Software Engineering Data

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
A Comparison of Software Fault Imputation Procedures

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Identifying noisy features with the Pairwise Attribute Noise Detection Algorithm

Intelligent Data Analysis
Imputation techniques for multivariate missingness in software measurement data

Software Quality Control
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES

Applied Artificial Intelligence

Mixture of Gaussians for distance estimation with missing data

Neurocomputing

Quantified Score

Hi-index	0.07

Visualization

Abstract

k nearest neighbor imputation (kNNI) is one of the most popular methods in empirical software engineering for imputing missing values. kNNI typically uses only complete cases as possible donors for imputation (called complete case kNNI or CCkNNI). Though it often produces reasonable results, CCkNNI is severely limited when the amount of missing data is large (and hence the number of complete cases is small). In response, a variant of CCkNNI called incomplete case k nearest neighbor imputation (ICkNNI) has been proposed as an attractive alternative. This work presents a detailed simulation comparing CCkNNI and ICkNNI using two different software measurement datasets. The empirical results show that using incomplete cases often increases the effectiveness of nearest neighbor imputation (especially at higher missingness levels), regardless of the type of missingness (i.e., the distribution of missing values in the data).