Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software Cost Estimation with Cocomo II with Cdrom
Software Cost Estimation with Cocomo II with Cdrom
Assessing the applicability of fault-proneness models across object-oriented software projects
IEEE Transactions on Software Engineering
Software Quality Prediction Using Mixture Models with EM Algorithm
APAQS '00 Proceedings of the The First Asia-Pacific Conference on Quality Software (APAQS'00)
How Valuable is company-specific Data Compared to multi-company Data for Software Cost Estimation?
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Visually mining and monitoring massive time series
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Weighted Naive Bayes with Accurate Ranking
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Reliability and Validity in Comparative Studies of Software Prediction Models
IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
Building Software Cost Estimation Models using Homogenous Data
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Implications of ceiling effects in defect predictors
Proceedings of the 4th international workshop on Predictor models in software engineering
Techniques for evaluating fault prediction models
Empirical Software Engineering
Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation
IEEE Transactions on Software Engineering
Dataset Shift in Machine Learning
Dataset Shift in Machine Learning
Cost Curve Evaluation of Fault Prediction Models
ISSRE '08 Proceedings of the 2008 19th International Symposium on Software Reliability Engineering
ACM Computing Surveys (CSUR)
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Introduction to Machine Learning
Introduction to Machine Learning
Discriminative Learning Under Covariate Shift
The Journal of Machine Learning Research
Stable rankings for different effort models
Automated Software Engineering
When to use data from other projects for effort estimation
Proceedings of the IEEE/ACM international conference on Automated software engineering
How to Find Relevant Data for Effort Estimation?
ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Special issue on repeatable results in software engineering prediction
Empirical Software Engineering
Data science for software engineering
Proceedings of the 2013 International Conference on Software Engineering
Beyond data mining; towards "idea engineering"
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
A core assumption of any prediction model is that test data distribution does not differ from training data distribution. Prediction models used in software engineering are no exception. In reality, this assumption can be violated in many ways resulting in inconsistent and non-transferrable observations across different cases. The goal of this paper is to explain the phenomena of conclusion instability through the dataset shift concept from software effort and fault prediction perspective. Different types of dataset shift are explained with examples from software engineering, and techniques for addressing associated problems are discussed. While dataset shifts in the form of sample selection bias and imbalanced data are well-known in software engineering research, understanding other types is relevant for possible interpretations of the non-transferable results across different sites and studies. Software engineering community should be aware of and account for the dataset shift related issues when evaluating the validity of research outcomes.