Statistical analysis with missing data
Statistical analysis with missing data
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation
Empirical Software Engineering
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques
Empirical Software Engineering
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
The Necessity of Assuring Quality in Software Measurement Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions
Empirical Software Engineering
Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data
Software Quality Control
Ensemble Imputation Methods for Missing Software Engineering Data
METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Determining noisy instances relative to attributes of interest
Intelligent Data Analysis
A Hybrid Approach to Cleansing Software Measurement Data
ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
The pairwise attribute noise detection algorithm
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Enhancing software quality estimation using ensemble-classifier based noise filtering
Intelligent Data Analysis
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
Handling missing data in software effort prediction with naive Bayes and EM algorithm
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
A robust missing value imputation method for noisy data
Applied Intelligence
Data quality in empirical software engineering: a targeted review
Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
On the value of outlier elimination on software effort estimation research
Empirical Software Engineering
Hi-index | 0.00 |
The handling of missing values is a topic of growing interest in the software quality modeling domain. Data values may be absent from a dataset for numerous reasons, for example, the inability to measure certain attributes. As software engineering datasets are sometimes small in size, discarding observations (or program modules) with incomplete data is usually not desirable. Deleting data from a dataset can result in a significant loss of potentially valuable information. This is especially true when the missing data is located in an attribute that measures the quality of the program module, such as the number of faults observed in the program module during testing and after release. We present a comprehensive experimental analysis of five commonly used imputation techniques. This work also considers three different mechanisms governing the distribution of missing values in a dataset, and examines the impact of noise on the imputation process. To our knowledge, this is the first study to thoroughly evaluate the relationship between data quality and imputation. Further, our work is unique in that it employs a software engineering expert to oversee the evaluation of all of the procedures and to ensure that the results are not inadvertently influenced by poor quality data. Based on a comprehensive set of carefully controlled experiments, we conclude that Bayesian multiple imputation and regression imputation are the most effective techniques, while mean imputation performs extremely poorly. Although a preliminary evaluation has been conducted using Bayesian multiple imputation in the empirical software engineering domain, this is the first work to provide a thorough and detailed analysis of this technique. Our studies also demonstrate conclusively that the presence of noisy data has a dramatic impact on the effectiveness of imputation techniques.