Dealing with Missing Software Project Data

Authors:
M. H. Cartwright;M. J. Shepperd;Q. Song
Affiliations:
-;-;-
Venue:
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Year:
2003

Citing 0
Cited 27

Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data

Software Quality Control
Ensemble of missing data techniques to improve software prediction accuracy

Proceedings of the 28th international conference on Software engineering
Using industry based data sets in software engineering research

Proceedings of the 2006 international workshop on Summit on software engineering education
Categorical missing data imputation for software cost estimation by multinomial logistic regression

Journal of Systems and Software
Applying statistical methodology to optimize and simplify software metric models with missing data

Proceedings of the 2006 ACM symposium on Applied computing
Benchmarking k-nearest neighbour imputation with homogeneous Likert data

Empirical Software Engineering
A comparative study of attribute weighting heuristics for effort estimation by analogy

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
A new imputation method for small software project data sets

Journal of Systems and Software
Developing a profit and loss (P&L) model for mid-development software requirement change: poster session

Journal of Computing Sciences in Colleges
Decision Support Analysis for Software Effort Estimation by Analogy

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data

Journal of Systems and Software
Tests for consistent measurement of external subjective software quality attributes

Empirical Software Engineering
Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Journal of Systems and Software
Imputation techniques for multivariate missingness in software measurement data

Software Quality Control
LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

Empirical Software Engineering
Ensemble missing data techniques for software effort prediction

Intelligent Data Analysis
A principled evaluation of ensembles of learning machines for software effort estimation

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
A SVM regression based approach to filling in missing values

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Can cross-company data improve performance in software effort estimation?

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Optimum estimation of missing values in randomized complete block design by genetic algorithm

Knowledge-Based Systems
The impact of parameter tuning on software effort estimation using learning machines

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
An algorithmic approach to missing data problem in modeling human aspects in software development

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
On the value of outlier elimination on software effort estimation research

Empirical Software Engineering
AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning

Information and Software Technology
Software effort estimation as a multiobjective learning problem

ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Incomplete-case nearest neighbor imputation in software measurement data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, there has been relatively little research into many of the obstacles to data collection and analysis in the real world. One feature that characterises many of the data sets we deal with is missing or highly questionable values. Naturally this problem is not unique to software engineering, so in this paper we explore the application of two existing data imputation techniques that have been used to good effect elsewhere. In order to assess the potential value of imputation we use two industrial data sets. Both are quite problematic from an effort modelling perspective because they contain few cases, have a significant number of missing values and the projects are quite heterogeneous. The question we pose is can imputation help? To answer we examine the quality of fit of effort models derived by stepwise regression on the raw data and data sets with values imputed by various techniques is compared. In both data sets we find that k-Nearest Neighbour (k-NN) and sample mean imputation (SMI) significantly improve the model fit, with k-NN giving the best results. These results are consistent with other recently published results, consequently we conclude that imputation can assist empirical software engineering.