An algorithmic approach to missing data problem in modeling human aspects in software development

Authors:
Gul Calikli;Ayse Bener
Affiliations:
Ryerson University;Ryerson University
Venue:
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Year:
2013

Citing 24
Cited 0

Cognitive bias in software engineering

Communications of the ACM
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Dealing with Missing Software Project Data

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Studying Software Engineers: Data Collection Techniques for Software Field Studies

Empirical Software Engineering
Damped Newton Algorithms for Matrix Factorization with Missing Data

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Ensemble of missing data techniques to improve software prediction accuracy

Proceedings of the 28th international conference on Software engineering
Categorical missing data imputation for software cost estimation by multinomial logistic regression

Journal of Systems and Software
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Using Developer Information as a Factor for Fault Prediction

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Modeling relationships at multiple scales to improve accuracy of large recommender systems

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Handling Missing Values when Applying Classification Models

The Journal of Machine Learning Research
The influence of organizational structure on software quality: an empirical case study

Proceedings of the 30th international conference on Software engineering
Ensemble of software defect predictors: a case study

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Estimating 3D shape from degenerate sequences with missing data

Computer Vision and Image Understanding
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Putting It All Together: Using Socio-technical Networks to Predict Failures

ISSRE '09 Proceedings of the 2009 20th International Symposium on Software Reliability Engineering
An analysis of the effects of company culture, education and experience on confirmation bias levels of software developers and testers

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Missing data imputation using statistical and machine learning methods in a real breast cancer problem

Artificial Intelligence in Medicine
Empirical analyses of the factors affecting confirmation bias and the effects of confirmation bias on software developer/tester performance

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Different strokes for different folks: a case study on software metrics for different defect categories

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Influence of confirmation biases of developers on software quality: an empirical study

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: In our previous research, we built defect prediction models by using confirmation bias metrics. Due to confirmation bias developers tend to perform unit tests to make their programs run rather than breaking their code. This, in turn, leads to an increase in defect density. The performance of prediction model that is built using confirmation bias was as good as the models that were built with static code or churn metrics. Aims: Collection of confirmation bias metrics may result in partially "missing data" due to developers' tight schedules, evaluation apprehension and lack of motivation as well as staff turnover. In this paper, we employ Expectation-Maximization (EM) algorithm to impute missing confirmation bias data. Method: We used four datasets from two large-scale companies. For each dataset, we generated all possible missing data configurations and then employed Roweis' EM algorithm to impute missing data. We built defect prediction models using the imputed data. We compared the performances of our proposed models with the ones that used complete data. Results: In all datasets, when missing data percentage is less than or equal to 50% on average, our proposed model that used imputed data yielded performance results that are comparable with the performance results of the models that used complete data. Conclusions: We may encounter the "missing data" problem in building defect prediction models. Our results in this study showed that instead of discarding missing or noisy data, in our case confirmation bias metrics, we can use effective techniques such as EM based imputation to overcome this problem.