Machine Learning
Automated support for classifying software failure reports
Proceedings of the 25th International Conference on Software Engineering
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Tree-Based Methods for Classifying Software Failures
ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Coping with an open bug repository
eclipse '05 Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange
Proceedings of the 28th international conference on Software engineering
A Linguistic Analysis of How People Describe Software Problems
VLHCC '06 Proceedings of the Visual Languages and Human-Centric Computing
Optimisation methods for ranking functions with multiple parameters
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Detection of Duplicate Defect Reports Using Natural Language Processing
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
An approach to detecting duplicate bug reports using natural language and execution information
Proceedings of the 30th international conference on Software engineering
Extracting structural information from bug reports
Proceedings of the 2008 international working conference on Mining software repositories
Introduction to Information Retrieval
Introduction to Information Retrieval
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
A discriminative model approach for accurate duplicate bug report retrieval
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Detecting Duplicate Bug Report Using Character N-Gram-Based Features
APSEC '10 Proceedings of the 2010 Asia Pacific Software Engineering Conference
Identifying Linux bug fixing patches
Proceedings of the 34th International Conference on Software Engineering
Duplicate bug report detection with a combination of information retrieval and topic modeling
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Learning to rank duplicate bug reports
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Search-based duplicate defect detection: an industrial experience
Proceedings of the 10th Working Conference on Mining Software Repositories
A contextual approach towards more accurate duplicate bug report detection
Proceedings of the 10th Working Conference on Mining Software Repositories
Hi-index | 0.00 |
In a bug tracking system, different testers or users may submit multiple reports on the same bugs, referred to as duplicates, which may cost extra maintenance efforts in triaging and fixing bugs. In order to identify such duplicates accurately, in this paper we propose a retrieval function (REP) to measure the similarity between two bug reports. It fully utilizes the information available in a bug report including not only the similarity of textual content in summary and description fields, but also similarity of non-textual fields such as product, component, version, etc. For more accurate measurement of textual similarity, we extend BM25F - an effective similarity formula in information retrieval community, specially for duplicate report retrieval. Lastly we use a two-round stochastic gradient descent to automatically optimize REP for specific bug repositories in a supervised learning manner. We have validated our technique on three large software bug repositories from Mozilla, Eclipse and OpenOffice. The experiments show 10 -- 27% relative improvement in recall rate@k and 17 -- 23% relative improvement in mean average precision over our previous model. We also applied our technique to a very large dataset consisting of 209,058 reports from Eclipse, resulting in a recall rate@k of 37 -- 71% and mean average precision of 47%.