Cognitive bias in software engineering
Communications of the ACM
EM algorithms for PCA and SPCA
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
Studying Software Engineers: Data Collection Techniques for Software Field Studies
Empirical Software Engineering
Damped Newton Algorithms for Matrix Factorization with Missing Data
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Ensemble of missing data techniques to improve software prediction accuracy
Proceedings of the 28th international conference on Software engineering
Categorical missing data imputation for software cost estimation by multinomial logistic regression
Journal of Systems and Software
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Using Developer Information as a Factor for Fault Prediction
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Modeling relationships at multiple scales to improve accuracy of large recommender systems
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Handling Missing Values when Applying Classification Models
The Journal of Machine Learning Research
The influence of organizational structure on software quality: an empirical case study
Proceedings of the 30th international conference on Software engineering
Ensemble of software defect predictors: a case study
Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
IEEE Transactions on Software Engineering
Estimating 3D shape from degenerate sequences with missing data
Computer Vision and Image Understanding
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Putting It All Together: Using Socio-technical Networks to Predict Failures
ISSRE '09 Proceedings of the 2009 20th International Symposium on Software Reliability Engineering
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Artificial Intelligence in Medicine
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Influence of confirmation biases of developers on software quality: an empirical study
Software Quality Control
Hi-index | 0.00 |
Background: In our previous research, we built defect prediction models by using confirmation bias metrics. Due to confirmation bias developers tend to perform unit tests to make their programs run rather than breaking their code. This, in turn, leads to an increase in defect density. The performance of prediction model that is built using confirmation bias was as good as the models that were built with static code or churn metrics. Aims: Collection of confirmation bias metrics may result in partially "missing data" due to developers' tight schedules, evaluation apprehension and lack of motivation as well as staff turnover. In this paper, we employ Expectation-Maximization (EM) algorithm to impute missing confirmation bias data. Method: We used four datasets from two large-scale companies. For each dataset, we generated all possible missing data configurations and then employed Roweis' EM algorithm to impute missing data. We built defect prediction models using the imputed data. We compared the performances of our proposed models with the ones that used complete data. Results: In all datasets, when missing data percentage is less than or equal to 50% on average, our proposed model that used imputed data yielded performance results that are comparable with the performance results of the models that used complete data. Conclusions: We may encounter the "missing data" problem in building defect prediction models. Our results in this study showed that instead of discarding missing or noisy data, in our case confirmation bias metrics, we can use effective techniques such as EM based imputation to overcome this problem.