Estimating Software Project Effort Using Analogies
IEEE Transactions on Software Engineering
A Procedure for Analyzing Unbalanced Datasets
IEEE Transactions on Software Engineering
Software Engineering Economics
Software Engineering Economics
Software Development Cost Estimation Using Function Points
IEEE Transactions on Software Engineering
An empirical study of maintenance and development estimation accuracy
Journal of Systems and Software
Human Performance Estimating with Analogy and Regression Models: An Empirical Validation
METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
A Simulation Study of the Model Evaluation Criterion MMRE
IEEE Transactions on Software Engineering
An analysis of data sets used to train and validate cost prediction systems
PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Using Grey Relational Analysis to Predict Software Effort with Small Data Sets
METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
The adjusted analogy-based software effort estimation based on similarity distances
Journal of Systems and Software
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Journal of Systems and Software
Comparing cost prediction models by resampling techniques
Journal of Systems and Software
An empirical validation of a neural network model for software effort estimation
Expert Systems with Applications: An International Journal
Software development cost estimation using wavelet neural networks
Journal of Systems and Software
Segmented software cost estimation models based on fuzzy clustering
Journal of Systems and Software
Software project effort estimation with voting rules
Decision Support Systems
Visual comparison of software cost estimation models by regression error characteristic analysis
Journal of Systems and Software
Information and Software Technology
Adaptive ridge regression system for software cost estimating on multi-collinear datasets
Journal of Systems and Software
Replication of defect prediction studies: problems, pitfalls and recommendations
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Validity and reliability of evaluation procedures in comparative studies of effort prediction models
Empirical Software Engineering
Empirical Software Engineering
Search-based approaches for software development effort estimation
Proceedings of the 12th International Conference on Product Focused Software Development and Process Improvement
An exploratory study on the accuracy of FPA to COSMIC measurement method conversion types
Information and Software Technology
StatREC: a graphical user interface tool for visual hypothesis testing of cost prediction models
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Alternative methods using similarities in software effort estimation
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Size doesn't matter?: on the value of software size features for effort estimation
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Functional Link Artificial Neural Networks for Software Cost Estimation
International Journal of Applied Evolutionary Computation
How to treat timing information for software effort estimation?
Proceedings of the 2013 International Conference on Software and System Process
Revisiting software development effort estimation based on early phase development activities
Proceedings of the 10th Working Conference on Mining Software Repositories
Building a second opinion: learning cross-company data
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Towards a simplified definition of Function Points
Information and Software Technology
Hi-index | 0.01 |
Background: Many cost estimation papers are based on finding a "new" estimation method, trying out the method on one or two past datasets and "proving" that the new method is better than linear regression. Aim: This paper aims to explain why this approach to model comparison is often invalid and to suggest that the PROMISE repository may be making things worse. Method: We identify some of the theoretical problems with studies that compare different estimation models. We review some of the commonly used datasets from the viewpoint of the reliability of the data and the validity of the proposed linear regression models. Discussion points: It is invalid to select one or two datasets to "prove" the validity of a new technique because we cannot be sure that, of the many published datasets, those chosen are the only ones that favour the new technique. When new models are compared with regression models, researchers need to understand how to use regression analysis appropriately. The use of linear regression presupposes: a linear relationship between dependent and independent variables, no significant outliers, no significant skewness, no relationship between the variance of the dependent variable and the magnitude of the variable. If all these conditions are not true, standard statistical practice is to use a robust regression or transform the data. The logarithmic transformation is appropriate in many cases, and for the Desharnais dataset gives better results than the regression model presented in the PROMISE repository. Conclusions: Simplistic studies comparing data intensive methods with linear regression will be scientifically valueless, if the regression techniques are applied incorrectly. They are also suspect if only a small number of datasets are used and the selection of those datasets is not scientifically justified.