Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Robust regression and outlier detection
Robust regression and outlier detection
Software engineering metrics and models
Software engineering metrics and models
Method to estimate parameter values in software prediction models
Information and Software Technology - Information and software economics
Robust regression for developing software estimation models
Journal of Systems and Software
Estimating Software Project Effort Using Analogies
IEEE Transactions on Software Engineering
ACM Computing Surveys (CSUR)
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
CMMI Guidlines for Process Integration and Product Improvement
CMMI Guidlines for Process Integration and Product Improvement
Using Public Domain Metrics To Estimate Software Development Effort
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Combining techniques to optimize effort predictions in software project management
Journal of Systems and Software
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
A Simulation Study of the Model Evaluation Criterion MMRE
IEEE Transactions on Software Engineering
Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Reliability and Validity in Comparative Studies of Software Prediction Models
IEEE Transactions on Software Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Probabilistic Model for Predicting Software Development Effort
IEEE Transactions on Software Engineering
Computing LTS Regression for Large Data Sets
Data Mining and Knowledge Discovery
Cross-company and single-company effort models using the ISBSG database: a further replicated study
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Introduction to Statistical Methods and Data Analysis (with CD-ROM)
Introduction to Statistical Methods and Data Analysis (with CD-ROM)
The adjusted analogy-based software effort estimation based on similarity distances
Journal of Systems and Software
Outlier elimination in construction of software metric models
Proceedings of the 2007 ACM symposium on Applied computing
A Systematic Review of Software Development Cost Estimation Studies
IEEE Transactions on Software Engineering
Replicating studies on cross- vs single-company effort models using the ISBSG Database
Empirical Software Engineering
The multivariate least-trimmed squares estimator
Journal of Multivariate Analysis
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
An empirical analysis of software effort estimation with outlier elimination
Proceedings of the 4th international workshop on Predictor models in software engineering
Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data
Software Quality Control
Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation
IEEE Transactions on Software Engineering
ENNA: software effort estimation using ensemble of neural networks with associative memory
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
A study of project selection and feature weighting for analogy based software cost estimation
Journal of Systems and Software
APSEC '09 Proceedings of the 2009 16th Asia-Pacific Software Engineering Conference
APSEC '09 Proceedings of the 2009 16th Asia-Pacific Software Engineering Conference
Stable rankings for different effort models
Automated Software Engineering
Local vs. global models for effort estimation and defect prediction
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Exploiting the Essential Assumptions of Analogy-Based Effort Estimation
IEEE Transactions on Software Engineering
Information and Software Technology
Hi-index | 0.00 |
Producing accurate and reliable software effort estimation has always been a challenge for both academic research and software industries. Regarding this issue, data quality is an important factor that impacts the estimation accuracy of effort estimation methods. To assess the impact of data quality, we investigated the effect of eliminating outliers on the estimation accuracy of commonly used software effort estimation methods. Based on three research questions, we associatively analyzed the influence of outlier elimination on the accuracy of software effort estimation by applying five methods of outlier elimination (Least trimmed squares, Cook's distance, K-means clustering, Box plot, and Mantel leverage metric) and two methods of effort estimation (Least squares regression and Estimation by analogy with the variation of the parameters). Empirical experiments were performed using industrial data sets (ISBSG Release 9, Bank and Stock data sets that are collected from financial companies, and a Desharnais data set in the PROMISE repository). In addition, the effect of the outlier elimination methods is evaluated by the statistical tests (the Friedman test and the Wilcoxon signed rank test). The experimental results derived from the evaluation criteria showed that there was no substantial difference between the software effort estimation results with and without outlier elimination. However, statistical analysis indicated that outlier elimination leads to a significant improvement in the estimation accuracy on the Stock data set (in case of some combinations of outlier elimination and effort estimation methods). In addition, although outlier elimination did not lead to a significant improvement in the estimation accuracy on the other data sets, our graphical analysis of errors showed that outlier elimination can improve the likelihood to produce more accurate effort estimates for new software project data to be estimated. Therefore, from a practical point of view, it is necessary to consider the outlier elimination and to conduct a detailed analysis of the effort estimation results to improve the accuracy of software effort estimation in software organizations.