Statistical analysis with missing data
Statistical analysis with missing data
Software engineering metrics and models
Software engineering metrics and models
C4.5: programs for machine learning
C4.5: programs for machine learning
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to classify incomplete examples
Computational learning theory and natural learning systems: Volume IV
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Mining massively incomplete data sets by conceptual reconstruction
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Search Heuristics, Case-based Reasoning And Software Project Effort Prediction
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
A Review of Surveys on Software Effort Estimation
ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions
Empirical Software Engineering
Using Multivariate Statistics (5th Edition)
Using Multivariate Statistics (5th Edition)
The Bayesian structural EM algorithm
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Reconstruction of baseline JPEG coded images in error prone environments
IEEE Transactions on Image Processing
Missing Data Imputation Techniques
International Journal of Business Intelligence and Data Mining
An empirical analysis of software effort estimation with outlier elimination
Proceedings of the 4th international workshop on Predictor models in software engineering
Journal of Systems and Software
On the influence of imputation in classification: practical issues
Journal of Experimental & Theoretical Artificial Intelligence
A study of the non-linear adjustment for analogy based software cost estimation
Empirical Software Engineering
Methodologies for model-free data interpretation of civil engineering structures
Computers and Structures
Similarities in fuzzy data mining: from a cognitive view to real-world applications
WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
Adaptive ridge regression system for software cost estimating on multi-collinear datasets
Journal of Systems and Software
Handling missing data in software effort prediction with naive Bayes and EM algorithm
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Expert Systems with Applications: An International Journal
Case-based reasoning in comparative effectiveness research
IBM Journal of Research and Development
Detecting mistakes in binary data tables
Automatic Documentation and Mathematical Linguistics
Hi-index | 0.00 |
Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common practice is to ignore the cases with missing data, but this makes the originally small software project database even smaller and can further decrease the accuracy of prediction. The alternative is missing data imputation. There are many imputation methods. Software data sets are frequently characterised by their small size but unfortunately sophisticated imputation methods prefer larger data sets. For this reason we explore using simple methods to impute missing data in small project effort data sets. We propose a class mean imputation (CMI) method based on the k-NN hot deck imputation method (MINI) to impute both continuous and nominal missing data in small data sets. We use an incremental approach to increase the variance of population. To evaluate MINI (and k-NN and CMI methods as benchmarks) we use data sets with 50 cases and 100 cases sampled from a larger industrial data set with 10%, 15%, 20% and 30% missing data percentages respectively. We also simulate Missing Completely at Random (MCAR) and Missing at Random (MAR) missingness mechanisms. The results suggest that the MINI method outperforms both CMI and the k-NN methods. We conclude that this new imputation technique can be used to impute missing values in small data sets.