Statistical analysis with missing data
Statistical analysis with missing data
C4.5: programs for machine learning
C4.5: programs for machine learning
Measure—based classifier performance evaluation
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Machine Learning
Machine Learning and Software Engineering
Software Quality Control
A Probabilistic Model for Predicting Software Development Effort
IEEE Transactions on Software Engineering
Categorical missing data imputation for software cost estimation by multinomial logistic regression
Journal of Systems and Software
A new imputation method for small software project data sets
Journal of Systems and Software
Software quality estimation with limited fault data: a semi-supervised learning perspective
Software Quality Control
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
An empirical validation of a neural network model for software effort estimation
Expert Systems with Applications: An International Journal
Journal of Systems and Software
Bayesian Network Models for Web Effort Prediction: A Comparative Study
IEEE Transactions on Software Engineering
Imputation techniques for multivariate missingness in software measurement data
Software Quality Control
An investigation of software development productivity in China
ICSP'08 Proceedings of the Software process, 2008 international conference on Making globally distributed software development a success story
Hi-index | 0.00 |
Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt naïve Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has significantly improved the performances of effort prediction of naïve Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets.