Software metrics: establishing a company-wide program
Software metrics: establishing a company-wide program
In the age of the smart machine: the future of work and power
In the age of the smart machine: the future of work and power
Empirical studies of software engineering: a roadmap
Proceedings of the Conference on The Future of Software Engineering
Machine Learning
Populating a Release History Database from Version Control and Bug Tracking Systems
ICSM '03 Proceedings of the International Conference on Software Maintenance
Defect Handling in Medium and Large Open Source Projects
IEEE Software
Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hipikat: A Project Memory for Software Development
IEEE Transactions on Software Engineering
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Towards predictor models for large libre software projects
PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
An investigation of the effect of module size on defect prediction using static measures
PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Predicting component failures at design time
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Automatic Identification of Bug-Introducing Changes
ASE '06 Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Predicting Faults from Cached History
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Defects for Eclipse
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Predicting vulnerable software components
Proceedings of the 14th ACM conference on Computer and communications security
Extraction of bug localization benchmarks from history
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Guide to Advanced Empirical Software Engineering
Guide to Advanced Empirical Software Engineering
Proceedings of the 30th international conference on Software engineering
Data sets and data quality in software engineering
Proceedings of the 4th international workshop on Predictor models in software engineering
Do Crosscutting Concerns Cause Defects?
IEEE Transactions on Software Engineering
Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
An empirical study of reported bugs in server software with implications for automated bug diagnosis
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Predicting the fix time of bugs
Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
A machine learning approach for text categorization of fixing-issue commits on CVS
Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Automatically documenting program changes
Proceedings of the IEEE/ACM international conference on Automated software engineering
The missing links: bugs and bug-fix commits
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes
ESSoS'11 Proceedings of the Third international conference on Engineering secure software and systems
"Not my bug!" and other reasons for software bug report reassignments
Proceedings of the ACM 2011 conference on Computer supported cooperative work
Design evolution metrics for defect prediction in object oriented systems
Empirical Software Engineering
Comparing fine-grained source code changes and code churn for bug prediction
Proceedings of the 8th Working Conference on Mining Software Repositories
An empirical analysis of the FixCache algorithm
Proceedings of the 8th Working Conference on Mining Software Repositories
An empirical study of build maintenance effort
Proceedings of the 33rd International Conference on Software Engineering
An empirical investigation into the role of API-level refactorings during software evolution
Proceedings of the 33rd International Conference on Software Engineering
Detecting software modularity violations
Proceedings of the 33rd International Conference on Software Engineering
Dealing with noise in defect prediction
Proceedings of the 33rd International Conference on Software Engineering
Ownership, experience and defects: a fine-grained study of authorship
Proceedings of the 33rd International Conference on Software Engineering
Nothing else matters: what predictive model should I use?
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Using the gini coefficient for bug prediction in eclipse
Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution
ReLink: recovering links between bugs and changes
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Micro interaction metrics for defect prediction
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
BugCache for inspections: hit or miss?
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Using structural and textual information to capture feature coupling in object-oriented software
Empirical Software Engineering
Faster issue resolution with higher technical quality of software
Software Quality Control
Empirical Software Engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison
Empirical Software Engineering
Time variance and defect prediction in software projects
Empirical Software Engineering
A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each
Proceedings of the 34th International Conference on Software Engineering
Bug prediction based on fine-grained module histories
Proceedings of the 34th International Conference on Software Engineering
Content classification of development emails
Proceedings of the 34th International Conference on Software Engineering
Identifying Linux bug fixing patches
Proceedings of the 34th International Conference on Software Engineering
Information needs for software development analytics
Proceedings of the 34th International Conference on Software Engineering
Goldfish bowl panel: software development analytics
Proceedings of the 34th International Conference on Software Engineering
Five days of empirical software engineering: the PASED experience
Proceedings of the 34th International Conference on Software Engineering
Defect, defect, defect: defect prediction 2.0
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Multi-layered approach for recovering links between bug reports and fixes
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Studying the impact of social interactions on software quality
Empirical Software Engineering
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Proceedings of the 2013 International Conference on Software Engineering
Does bug prediction support human developers? findings from a google case study
Proceedings of the 2013 International Conference on Software Engineering
Proceedings of the 2013 International Conference on Software Engineering
It's not a bug, it's a feature: how misclassification impacts bug prediction
Proceedings of the 2013 International Conference on Software Engineering
How, and why, process metrics are better
Proceedings of the 2013 International Conference on Software Engineering
Measuring architecture quality by structure plus history analysis
Proceedings of the 2013 International Conference on Software Engineering
Linux variability anomalies: what causes them and how do they get fixed?
Proceedings of the 10th Working Conference on Mining Software Repositories
The impact of tangled code changes
Proceedings of the 10th Working Conference on Mining Software Repositories
Discovering, reporting, and fixing performance bugs
Proceedings of the 10th Working Conference on Mining Software Repositories
Using citation influence to predict software defects
Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
API change and fault proneness: a threat to the success of Android apps
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
A new perspective on the socialness in bug triaging: a case study of the eclipse platform project
Proceedings of the 2013 International Workshop on Social Software Engineering
Hi-index | 0.00 |
Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.