COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data
Software Quality Control
Ensemble of missing data techniques to improve software prediction accuracy
Proceedings of the 28th international conference on Software engineering
Using industry based data sets in software engineering research
Proceedings of the 2006 international workshop on Summit on software engineering education
Categorical missing data imputation for software cost estimation by multinomial logistic regression
Journal of Systems and Software
Applying statistical methodology to optimize and simplify software metric models with missing data
Proceedings of the 2006 ACM symposium on Applied computing
Benchmarking k-nearest neighbour imputation with homogeneous Likert data
Empirical Software Engineering
A comparative study of attribute weighting heuristics for effort estimation by analogy
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
A new imputation method for small software project data sets
Journal of Systems and Software
Journal of Computing Sciences in Colleges
Decision Support Analysis for Software Effort Estimation by Analogy
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
Tests for consistent measurement of external subjective software quality attributes
Empirical Software Engineering
Journal of Systems and Software
Imputation techniques for multivariate missingness in software measurement data
Software Quality Control
Empirical Software Engineering
Ensemble missing data techniques for software effort prediction
Intelligent Data Analysis
A principled evaluation of ensembles of learning machines for software effort estimation
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
A SVM regression based approach to filling in missing values
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Can cross-company data improve performance in software effort estimation?
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Optimum estimation of missing values in randomized complete block design by genetic algorithm
Knowledge-Based Systems
The impact of parameter tuning on software effort estimation using learning machines
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
An algorithmic approach to missing data problem in modeling human aspects in software development
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
On the value of outlier elimination on software effort estimation research
Empirical Software Engineering
Information and Software Technology
Software effort estimation as a multiobjective learning problem
ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Incomplete-case nearest neighbor imputation in software measurement data
Information Sciences: an International Journal
Hi-index | 0.00 |
Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so in this paper we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. The question we pose is can imputation help? To answer we examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared. In both data sets we find that k-Nearest Neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.