AUSUM: approach for unsupervised bug report summarization

Authors:
Senthil Mani;Rose Catherine;Vibha Singhal Sinha;Avinava Dubey
Affiliations:
IBM Research - India;IBM Research - India;IBM Research - India;IBM Research - India
Venue:
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Year:
2012

Citing 34
Cited 2

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Challenges of Automatic Summarization

Computer
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
NewsInEssence: a system for domain-independent, real-time news clustering and multi-document summarization

HLT '01 Proceedings of the first international conference on Human language technology research
Supervised ranking in open-domain text summarization

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improving web search results using affinity graph

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
Detection of question-answer pairs in email conversations

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detection of Duplicate Defect Reports Using Natural Language Processing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
How Long Will It Take to Fix This Bug?

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Predicting diverse subsets using structural SVMs

Proceedings of the 25th international conference on Machine learning
Finding question-answer pairs from online forums

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Enhancing diversity, coverage and balance for summarization through structure learning

Proceedings of the 18th international conference on World wide web
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Summarization of large scale social network activity

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Extractive summarization using supervised and semi-supervised learning

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Summarizing spoken and written conversations

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Generating and evaluating evaluative arguments

Artificial Intelligence
Summarization from medical documents: a survey

Artificial Intelligence in Medicine
The automatic creation of literature abstracts

IBM Journal of Research and Development
Extractive vs. NLG-based abstractive summarization of evaluative text: the effect of corpus controversiality

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Summarizing software artifacts: a case study of bug reports

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
DivRank: the interplay of prestige and diversity in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards automatically generating summary comments for Java methods

Proceedings of the IEEE/ACM international conference on Automated software engineering
Personalized video summarization with human in the loop

WACV '11 Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)
Automatically detecting and describing high level actions within methods

Proceedings of the 33rd International Conference on Software Engineering
Diversity in ranking via resistive graph centers

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy protected knowledge management in services with emphasis on quality data

Proceedings of the 20th ACM international conference on Information and knowledge management
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Bug resolution catalysts: identifying essential non-committers from bug repositories

Proceedings of the 10th Working Conference on Mining Software Repositories
Code fragment summarization

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In most software projects, resolved bugs are archived for future reference. These bug reports contain valuable information on the reported problem, investigation and resolution. When bug triaging, developers look for how similar problems were resolved in the past. Search over bug repository gives the developer a set of recommended bugs to look into. However, the developer still needs to manually peruse the contents of the recommended bugs which might vary in size from a couple of lines to thousands. Automatic summarization of bug reports is one way to reduce the amount of data a developer might need to go through. Prior work has presented learning based approaches for bug summarization. These approaches have the disadvantage of requiring large training set and being biased towards the data on which the model was learnt. In fact, maximum efficacy was reported when the model was trained and tested on bug reports from the same project. In this paper, we present the results of applying four unsupervised summarization techniques for bug summarization. Industrial bug reports typically contain a large amount of noise---email dump, chat transcripts, core-dump---useless sentences from the perspective of summarization. These derail the unsupervised approaches, which are optimized to work on more well-formed documents. We present an approach for noise reduction, which helps to improve the precision of summarization over the base technique (4% to 24% across subjects and base techniques). Importantly, by applying noise reduction, two of the unsupervised techniques became scalable for large sized bug reports.