A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating Software Project Effort Using Analogies
IEEE Transactions on Software Engineering
An assessment and comparison of common software cost estimation modeling techniques
Proceedings of the 21st international conference on Software engineering
A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models
IEEE Transactions on Software Engineering
Experimentation in software engineering: an introduction
Experimentation in software engineering: an introduction
Experience With the Accuracy of Software Maintenance Task Effort Prediction Models
IEEE Transactions on Software Engineering
A Simulation Study of the Model Evaluation Criterion MMRE
IEEE Transactions on Software Engineering
Software effort estimation by analogy and "regression toward the mean"
Journal of Systems and Software - Special issue: Best papers on Software Engineering from the SEKE'01 Conference
Further Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Reliability and Validity in Comparative Studies of Software Prediction Models
IEEE Transactions on Software Engineering
METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
A Systematic Review of Software Development Cost Estimation Studies
IEEE Transactions on Software Engineering
Software project economics: a roadmap
FOSE '07 2007 Future of Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
Systematic review: A systematic review of effect size in software engineering experiments
Information and Software Technology
Systematic literature reviews in software engineering - A systematic literature review
Information and Software Technology
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
How large are software cost overruns? A review of the 1994 CHAOS report
Information and Software Technology
Human judgement and software metrics: vision for the future
Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
A General Software Defect-Proneness Prediction Framework
IEEE Transactions on Software Engineering
A review of studies on expert estimation of software development effort
Journal of Systems and Software
Special issue on repeatable results in software engineering prediction
Empirical Software Engineering
Can cross-company data improve performance in software effort estimation?
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Pricing crowdsourcing-based software development tasks
Proceedings of the 2013 International Conference on Software Engineering
Search-based duplicate defect detection: an industrial experience
Proceedings of the 10th Working Conference on Mining Software Repositories
The impact of parameter tuning on software effort estimation using learning machines
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random 'predictions', that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.