Preliminary Data Analysis Methods in Software Estimation

Authors:
Qin Liu;Robert C. Mintram
Affiliations:
School of Informatics, University of Northumbria, UK;School of Informatics, University of Northumbria, UK
Venue:
Software Quality Control
Year:
2005

Citing 15
Cited 11

An empirical validation of software cost estimation models

Communications of the ACM
Software engineering metrics and models

Software engineering metrics and models
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
A Pattern Recognition Approach for Software Engineering Data Analysis

IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
Machine Learning Approaches to Estimating Software Development Effort

IEEE Transactions on Software Engineering
Software Development Productivity of European Space, Military, and Industrial Applications

IEEE Transactions on Software Engineering
A Procedure for Analyzing Unbalanced Datasets

IEEE Transactions on Software Engineering
Bayesian Analysis of Empirical Software Engineering Cost Models

IEEE Transactions on Software Engineering
Software metrics: roadmap

Proceedings of the Conference on The Future of Software Engineering
Software Engineering Economics

Software Engineering Economics
Measures for Excellence: Reliable Software on Time, within Budget

Measures for Excellence: Reliable Software on Time, within Budget
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Using Neural Networks in Reliability Prediction

IEEE Software
Status Report on Software Measurement

IEEE Software
Preliminary guidelines for empirical research in software engineering

IEEE Transactions on Software Engineering

Using industry based data sets in software engineering research

Proceedings of the 2006 international workshop on Summit on software engineering education
An empirical validation of a neural network model for software effort estimation

Expert Systems with Applications: An International Journal
Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Software Quality Control
Risk analysis in software development

AIC'08 Proceedings of the 8th conference on Applied informatics and communications
ONTOCOM Revisited: Towards Accurate Cost Predictions for Ontology Development Projects

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
A study of the non-linear adjustment for analogy based software cost estimation

Empirical Software Engineering
A new regression based software cost estimation model using power values

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Measuring the heterogeneity of cross-company dataset

Proceedings of the 11th International Conference on Product Focused Software
Local bias and its impacts on the performance of parametric estimation models

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
ONTOCOM: A reliable cost estimation method for ontology development projects

Web Semantics: Science, Services and Agents on the World Wide Web
Software Effort Estimation: Harmonizing Algorithms and Domain Knowledge in an Integrated Data Mining Approach

International Journal of Intelligent Information Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software is quite often expensive to develop and can become a major cost factor in corporate information systems驴 budgets. With the variability of software characteristics and the continual emergence of new technologies the accurate prediction of software development costs is a critical problem within the project management context. In order to address this issue a large number of software cost prediction models have been proposed. Each model succeeds to some extent but they all encounter the same problem, i.e., the inconsistency and inadequacy of the historical data sets. Often a preliminary data analysis has not been performed and it is possible for the data to contain non-dominated or confounded variables. Moreover, some of the project attributes or their values are inappropriately out of date, for example the type of computer used for project development in the COCOMO 81 (Boehm, 1981) data set. This paper proposes a framework composed of a set of clearly identified steps that should be performed before a data set is used within a cost estimation model. This framework is based closely on a paradigm proposed by Maxwell (2002). Briefly, the framework applies a set of statistical approaches, that includes correlation coefficient analysis, Analysis of Variance and Chi-Square test, etc., to the data set in order to remove outliers and identify dominant variables. To ground the framework within a practical context the procedure is used to analyze the ISBSG (International Software Benchmarking Standards Group data--Release 8) data set. This is a frequently used accessible data collection containing information for 2,008 software projects. As a consequence of this analysis, 6 explanatory variables are extracted and evaluated.