An empirical validation of software cost estimation models
Communications of the ACM
Robust regression for developing software estimation models
Journal of Systems and Software
Effort estimation using analogy
Proceedings of the 18th international conference on Software engineering
Inter-item correlations among function points
ICSE '93 Proceedings of the 15th international conference on Software Engineering
Estimating Software Project Effort Using Analogies
IEEE Transactions on Software Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software development cost estimation approaches – A survey
Annals of Software Engineering
A Comparative Study of Cost Estimation Models for Web Hypermedia Applications
Empirical Software Engineering
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Multiclassifier Systems: Back to the Future
MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
Reliability and Validity in Comparative Studies of Software Prediction Models
IEEE Transactions on Software Engineering
Optimal Project Feature Weights in Analogy-Based Cost Estimation: Improvement and Limitations
IEEE Transactions on Software Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
A Systematic Review of Software Development Cost Estimation Studies
IEEE Transactions on Software Engineering
Selecting Best Practices for Effort Estimation
IEEE Transactions on Software Engineering
Systematic review: A systematic review of effect size in software engineering experiments
Information and Software Technology
Finding Prototypes For Nearest Neighbor Classifiers
IEEE Transactions on Computers
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
ENNA: software effort estimation using ensemble of neural networks with associative memory
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation
APSEC '08 Proceedings of the 2008 15th Asia-Pacific Software Engineering Conference
A study of project selection and feature weighting for analogy based software cost estimation
Journal of Systems and Software
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
Special issue on repeatable results in software engineering prediction
Empirical Software Engineering
Local vs. global models for effort estimation and defect prediction
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Exploiting the Essential Assumptions of Analogy-Based Effort Estimation
IEEE Transactions on Software Engineering
A review of studies on expert estimation of software development effort
Journal of Systems and Software
On the Value of Ensemble Effort Estimation
IEEE Transactions on Software Engineering
Hi-index | 0.00 |
Context: More than half the literature on software effort estimation (SEE) focuses on model comparisons. Each of those requires a sampling method (SM) to generate the train and test sets. Different authors use different SMs such as leave-one-out (LOO), 3Way and 10Way cross-validation. While LOO is a deterministic algorithm, the N-way methods use random selection to build their train and test sets. This introduces the problem of conclusion instability where different authors rank effort estimators in different ways. Objective: To reduce conclusion instability by removing the effects of a sampling method's random test case generation. Method: Calculate bias and variance (B&V) values following the assumption that a learner trained on the whole dataset is taken as the true model; then demonstrate that the B&V and runtime values for LOO are similar to N-way by running 90 different algorithms on 20 different SEE datasets. For each algorithm, collect runtimes, B&V values under LOO, 3Way and 10Way. Results: We observed that: (1) the majority of the algorithms have statistically indistinguishable B&V values under different SMs and (2) different SMs have similar run times. Conclusion: In terms of their generated B&V values and runtimes, there is no reason to prefer N-way over LOO. In terms of reproducibility, LOO removes one cause of conclusion instability (the random selection of train and test sets). Therefore, we depreciate N-way and endorse LOO validation for assessing effort models.