A discriminative model approach for accurate duplicate bug report retrieval

Authors:
Chengnian Sun;David Lo;Xiaoyin Wang;Jing Jiang;Siau-Cheng Khoo
Affiliations:
National University of Singapore;Singapore Management University;Peking University, Ministry of Education;Singapore Management University;National University of Singapore
Venue:
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Year:
2010

Citing 13
Cited 17

An Approach to Classify Software Maintenance Requests

ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Coping with an open bug repository

eclipse '05 Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
A Linguistic Analysis of How People Describe Software Problems

VLHCC '06 Proceedings of the Visual Languages and Human-Centric Computing
Detection of Duplicate Defect Reports Using Natural Language Processing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Modeling bug report quality

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Extracting structural information from bug reports

Proceedings of the 2008 international working conference on Mining software repositories
What makes a good bug report?

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Classification of software behaviors for failure detection: a discriminative pattern mining approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting paraphrases of technical terms from noisy parallel software corpora

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Mining closed discriminative dyadic sequential patterns

Proceedings of the 14th International Conference on Extending Database Technology
Detecting bug duplicate reports through local references

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Fuzzy set and cache-based approach for bug triaging

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Towards more accurate retrieval of duplicate bug reports

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports

Proceedings of the 34th International Conference on Software Engineering
Developer prioritization in bug repositories

Proceedings of the 34th International Conference on Software Engineering
Identifying Linux bug fixing patches

Proceedings of the 34th International Conference on Software Engineering
To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Duplicate bug report detection with a combination of information retrieval and topic modeling

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Has this bug been reported?

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Learning to rank duplicate bug reports

Proceedings of the 21st ACM international conference on Information and knowledge management
Taming compiler fuzzers

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Mining explicit rules for software process evaluation

Proceedings of the 2013 International Conference on Software and System Process
Search-based duplicate defect detection: an industrial experience

Proceedings of the 10th Working Conference on Mining Software Repositories
A contextual approach towards more accurate duplicate bug report detection

Proceedings of the 10th Working Conference on Mining Software Repositories
A new perspective on the socialness in bug triaging: a case study of the eclipse platform project

Proceedings of the 2013 International Workshop on Social Software Engineering
An analysis of requirements evolution in open source projects: recommendations for issue trackers

Proceedings of the 2013 International Workshop on Principles of Software Evolution

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bug repositories are usually maintained in software projects. Testers or users submit bug reports to identify various issues with systems. Sometimes two or more bug reports correspond to the same defect. To address the problem with duplicate bug reports, a person called a triager needs to manually label these bug reports as duplicates, and link them to their "master" reports for subsequent maintenance work. However, in practice there are considerable duplicate bug reports sent daily; requesting triagers to manually label these bugs could be highly time consuming. To address this issue, recently, several techniques have be proposed using various similarity based metrics to detect candidate duplicate bug reports for manual verification. Automating triaging has been proved challenging as two reports of the same bug could be written in various ways. There is still much room for improvement in terms of accuracy of duplicate detection process. In this paper, we leverage recent advances on using discriminative models for information retrieval to detect duplicate bug reports more accurately. We have validated our approach on three large software bug repositories from Firefox, Eclipse, and OpenOffice. We show that our technique could result in 17--31%, 22--26%, and 35--43% relative improvement over state-of-the-art techniques in OpenOffice, Firefox, and Eclipse datasets respectively using commonly available natural language information only.