A Procedure for Analyzing Unbalanced Datasets

Authors:
Barbara Kitchenham
Affiliations:
Keele Univ., Staffordshire, UK
Venue:
IEEE Transactions on Software Engineering
Year:
1998

Citing 11
Cited 39

Software engineering metrics and models

Software engineering metrics and models
Linear models for unbalanced data

Linear models for unbalanced data
Empirical studies of assumptions that underlie software cost-estimation models

Information and Software Technology
Dimensionality reduction in software development effort estimation

Journal of Systems and Software
Effort estimation using analogy

Proceedings of the 18th international conference on Software engineering
Software Development Productivity of European Space, Military, and Industrial Applications

IEEE Transactions on Software Engineering
Inter-item correlations among function points

ICSE '93 Proceedings of the 15th international conference on Software Engineering
COCOMO evaluation and tailoring

ICSE '85 Proceedings of the 8th international conference on Software engineering
Software Engineering Economics

Software Engineering Economics
Estimates, Uncertainty, and Risk

IEEE Software
A meta-model for software development resource expenditures

ICSE '81 Proceedings of the 5th international conference on Software engineering

An assessment and comparison of common software cost estimation modeling techniques

Proceedings of the 21st international conference on Software engineering
A replicated assessment and comparison of common software cost modeling techniques

Proceedings of the 22nd international conference on Software engineering
Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
SPI Patterns: Learning from Experience

IEEE Software
Classification Tree Models of Software Quality Over Multiple Releases

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Improving Tree-Based Models of Software Quality with Principal Components Analysis

ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
A Simulation Study of the Model Evaluation Criterion MMRE

IEEE Transactions on Software Engineering
Software Productivity Measurement Using Multiple Size Measures

IEEE Transactions on Software Engineering
Preliminary Data Analysis Methods in Software Estimation

Software Quality Control
Reliability and Validity in Comparative Studies of Software Prediction Models

IEEE Transactions on Software Engineering
Effort estimation modeling techniques: a case study for web applications

ICWE '06 Proceedings of the 6th international conference on Web engineering
Effort estimation: how valuable is it for a web company to use a cross-company data set, compared to using its own single-company data set?

Proceedings of the 16th international conference on World Wide Web
Misleading Metrics and Unsound Analyses

IEEE Software
An investigation of artificial neural networks based prediction systems in software project management

Journal of Systems and Software
Comparing cost prediction models by resampling techniques

Journal of Systems and Software
A statistical framework for analyzing the duration of software projects

Empirical Software Engineering
Combining probabilistic models for explanatory productivity estimation

Information and Software Technology
Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Software Quality Control
Web Cost Estimation and Productivity Benchmarking

Software Engineering
Why comparative effort prediction studies may be invalid

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
ONTOCOM Revisited: Towards Accurate Cost Predictions for Ontology Development Projects

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Applying support vector regression for web effort estimation using a cross-company dataset

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Using Support Vector Regression for Web Development Effort Estimation

IWSM '09 /Mensura '09 Proceedings of the International Conferences on Software Process and Product Measurement
Using Tabu Search to Estimate Software Development Effort

IWSM '09 /Mensura '09 Proceedings of the International Conferences on Software Process and Product Measurement
LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

Empirical Software Engineering
A comprehensive characterization of NLP techniques for identifying equivalent requirements

Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Using chronological splitting to compare cross- and single-company effort models: further investigation

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Measuring the heterogeneity of cross-company dataset

Proceedings of the 11th International Conference on Product Focused Software
Investigating the use of Support Vector Regression for web effort estimation

Empirical Software Engineering
A COSMIC-FFP approach to predict web application development effort

Journal of Web Engineering
Measures and techniques for effort estimation of web applications: an empirical study based on a single-company dataset

Journal of Web Engineering
Local bias and its impacts on the performance of parametric estimation models

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Productivity reanalysis for unbalanced datasets with mixed-effects models

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study

EASE'09 Proceedings of the 13th international conference on Evaluation and Assessment in Software Engineering
Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions

EASE'08 Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering
Web effort estimation: the value of cross-company data set compared to single-company data set

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
ONTOCOM: A reliable cost estimation method for ontology development projects

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a procedure for analyzing unbalanced datasets that include many nominal- and ordinal-scale factors. Such datasets are often found in company datasets used for benchmarking and productivity assessment. The two major problems caused by lack of balance are that the impact of factors can be concealed and that spurious impacts can be observed. These effects are examined with the help of two small artificial datasets. The paper proposes a method of forward pass residual analysis to analyze such datasets. The analysis procedure is demonstrated on the artificial datasets and then applied to the COCOMO dataset. The paper ends with a discussion of the advantages and limitations of the analysis procedure.