An empirical study of build maintenance effort
Proceedings of the 33rd International Conference on Software Engineering
ReLink: recovering links between bugs and changes
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Content classification of development emails
Proceedings of the 34th International Conference on Software Engineering
Information needs for software development analytics
Proceedings of the 34th International Conference on Software Engineering
Multi-layered approach for recovering links between bug reports and fixes
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
It's not a bug, it's a feature: how misclassification impacts bug prediction
Proceedings of the 2013 International Conference on Software Engineering
The impact of tangled code changes
Proceedings of the 10th Working Conference on Mining Software Repositories
Communication in open source software development mailing lists
Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Hi-index | 0.00 |
Software quality researchers build software quality models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Researchers have to resort to heuristics to tag the data (e.g., to determine if an issue is a bug report or a work item), or to link a piece of code to a particular issue or bug. Recent studies by Bird et al. and by Antoniol et al. suggest that software models based on imperfect datasets with missing links to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality of the data in its issue tracking repository. Our results show that even in such a perfect setting, with a near-ideal dataset, biases do exist – leading us to conjecture that biases are more likely a symptom of the underlying software development process instead of being due to the used heuristics.