Outlier elimination in construction of software metric models

Authors:
Victor K. Y. Chan;W. Eric Wong
Affiliations:
Macao Polytechnic Institute, Rua de Luis Gonzaga Gomes, Macau;University of Texas at Dallas, Richardson TX
Venue:
Proceedings of the 2007 ACM symposium on Applied computing
Year:
2007

Citing 15
Cited 6

Software project development cost estimation

Journal of Systems and Software
Statistical analysis with missing data

Statistical analysis with missing data
Software engineering metrics and models

Software engineering metrics and models
An Evaluation of Expert Systems for Software Engineering Management

IEEE Transactions on Software Engineering
Examining the feasibility of a case-based reasoning model for software effort estimation

MIS Quarterly
Robust regression for developing software estimation models

Journal of Systems and Software
Rule-based approach to computing module cohesion

ICSE '93 Proceedings of the 15th international conference on Software Engineering
Explaining the cost of European space and military projects

Proceedings of the 21st international conference on Software engineering
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software Development Cost Estimation Using Function Points

IEEE Transactions on Software Engineering
Assessing the Benefits of Imputing ERP Projects with Missing Data

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Using Public Domain Metrics To Estimate Software Development Effort

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods

COMPSAC '05 Proceedings of the 29th Annual International Computer Software and Applications Conference - Volume 01

An empirical analysis of software effort estimation with outlier elimination

Proceedings of the 4th international workshop on Predictor models in software engineering
An empirical evaluation of outlier deletion methods for analogy-based cost estimation

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Automated trendline generation for accurate software effort estimation

Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Software Effort Estimation: Harmonizing Algorithms and Domain Knowledge in an Integrated Data Mining Approach

International Journal of Intelligent Information Technologies
On the value of outlier elimination on software effort estimation research

Empirical Software Engineering
AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining "mainstream" bulk of projects in the data sample. Such "outlier" projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true "mainstream" relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.