Building a second opinion: learning cross-company data

Authors:
Ekrem Kocaguneli;Bojan Cukic;Tim Menzies;Huihua Lu
Affiliations:
West Virginia University, Morgantown;West Virginia University, Morgantown;West Virginia University, Morgantown;West Virginia University, Morgantown
Venue:
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Year:
2013

Citing 25
Cited 0

Estimating Software Project Effort Using Analogies

IEEE Transactions on Software Engineering
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Case Studies for Method and Tool Evaluation

IEEE Software
A Comparative Study of Cost Estimation Models for Web Hypermedia Applications

Empirical Software Engineering
A Simulation Study of the Model Evaluation Criterion MMRE

IEEE Transactions on Software Engineering
Active learning for automatic classification of software behavior

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Distribution Patterns of Effort Estimations

EUROMICRO '04 Proceedings of the 30th EUROMICRO Conference
Semi-Supervised Self-Training of Object Detection Models

WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Finding Prototypes For Nearest Neighbor Classifiers

IEEE Transactions on Computers
Empirical evaluation of analogy-x for software cost estimation

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
A study of project selection and feature weighting for analogy based software cost estimation

Journal of Systems and Software
Why comparative effort prediction studies may be invalid

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Applying moving windows to software effort estimation

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Using differences among replications of software engineering experiments to gain knowledge

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
When to use data from other projects for effort estimation

Proceedings of the IEEE/ACM international conference on Automated software engineering
How to Find Relevant Data for Effort Estimation?

ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Transfer learning for cross-company software defect prediction

Information and Software Technology
Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

IEEE Transactions on Software Engineering
Software defect prediction using semi-supervised learning with dimension reduction

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Can cross-company data improve performance in software effort estimation?

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data

IEEE Transactions on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Developing and maintaining a software effort estimation (SEE) data set within a company (within data) is costly. Often times parts of data may be missing or too difficult to collect, e.g. effort values. However, information about the past projects-although incomplete- may be helpful, when incorporated with the SEE data sets from other companies (cross data). Aim: Utilizing cross data to aid within company estimates and local experts; Proposing a synergy between semi-supervised, active and cross company learning for software effort estimation. Method: The proposed method: 1) Summarizes existing unlabeled within data; 2) Uses cross data to provide pseudo-labels for the summarized within data; 3) Uses steps 1 and 2 to provide an estimate for the within test data as an input for the local company experts. We use 21 data sets and compare the proposed method to existing state-of-the-art within and cross company effort estimation methods subject to evaluation by 7 different error measures. Results: In 132 out of 147 settings (21 data sets X 7 error measures = 147 settings), the proposed method performs as well as the state-of-the-art methods. Also, the proposed method summarizes the past within data down to at most 15% of the original data. Conclusion: It is important to look for synergies amongst cross company and within-company effort estimation data, even when the latter is imperfect or sparse. In this research, we provide the experts with a method that: 1) is competent (performs as well as prior within and cross data estimation methods) 2) reflects on local data (estimates come from the within data); 3) is succinct (summarizes within data down to 15% or less); 4) cheap (easy to build).