Two case studies of open source software development: Apache and Mozilla
ACM Transactions on Software Engineering and Methodology (TOSEM)
Hipikat: recommending pertinent software development artifacts
Proceedings of the 25th International Conference on Software Engineering
Integrating Information Sources for Visualizing Java Programs
ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
Populating a Release History Database from Version Control and Bug Tracking Systems
ICSM '03 Proceedings of the International Conference on Software Maintenance
Analyzing and Relating Bug Report Data for Feature Tracking
WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
An Empirical Study of Open-Source and Closed-Source Software Products
IEEE Transactions on Software Engineering
Empirical Software Engineering
Defect Handling in Medium and Large Open Source Projects
IEEE Software
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
EvoLens: Lens-View Visualizations of Evolution Data
IWPSE '05 Proceedings of the Eighth International Workshop on Principles of Software Evolution
A Linguistic Analysis of How People Describe Software Problems
VLHCC '06 Proceedings of the Visual Languages and Human-Centric Computing
Predicting Faults from Cached History
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Local and Global Recency Weighting Approach to Bug Prediction
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Predicting Defects for Eclipse
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Improving defect prediction using temporal features and non linear models
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Quality of bug reports in Eclipse
Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
Towards the next generation of bug tracking systems
VLHCC '08 Proceedings of the 2008 IEEE Symposium on Visual Languages and Human-Centric Computing
The secret life of bugs: Going past the errors and omissions in software repositories
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
The promises and perils of mining git
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Fair and balanced?: bias in bug-fix datasets
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops
LINKSTER: enabling efficient manual inspection and annotation of mined data
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Which bug should I fix: helping new developers onboard a new project
Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering
An empirical analysis of the FixCache algorithm
Proceedings of the 8th Working Conference on Mining Software Repositories
ReLink: recovering links between bugs and changes
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Information needs for software development analytics
Proceedings of the 34th International Conference on Software Engineering
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Multi-layered approach for recovering links between bug reports and fixes
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
It's not a bug, it's a feature: how misclassification impacts bug prediction
Proceedings of the 2013 International Conference on Software Engineering
The impact of tangled code changes
Proceedings of the 10th Working Conference on Mining Software Repositories
Assisting code search with automatic query reformulation for bug localization
Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Hi-index | 0.00 |
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data.