Understanding and Controlling Software Costs
IEEE Transactions on Software Engineering
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
What We Have Learned About Fighting Defects
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Techniques for evaluating fault prediction models
Empirical Software Engineering
Empirical Software Engineering
IEEE Transactions on Software Engineering
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Proceedings of the 19th international conference on World wide web
When to use data from other projects for effort estimation
Proceedings of the IEEE/ACM international conference on Automated software engineering
Towards identifying software project clusters with regard to defect prediction
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness
APSEC '10 Proceedings of the 2010 Asia Pacific Software Engineering Conference
How to Find Relevant Data for Effort Estimation?
ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Transfer learning for cross-company software defect prediction
Information and Software Technology
Guest editorial: learning to organize testing
Automated Software Engineering
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
Local vs. global models for effort estimation and defect prediction
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Hi-index | 0.00 |
How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach? This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects. To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Within- company learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both within- company and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.