Retrieval from software libraries for bug localization: a comparative study of generic and composite text models

Authors:
Shivani Rao;Avinash Kak
Affiliations:
Purdue University, West Lafayette, USA;Purdue University, West Lafayette, USA
Venue:
Proceedings of the 8th Working Conference on Mining Software Repositories
Year:
2011

Citing 18
Cited 8

Software reuse through information retrieval

ACM SIGIR Forum
LaSSIE—a knowledge-based software information system

ICSE '90 Proceedings of the 12th international conference on Software engineering
An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
Intelligent search techniques for large software systems

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
A survey on the use of relevance feedback for information access systems

The Knowledge Engineering Review
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Finding bugs is easy

ACM SIGPLAN Notices
SOBER: statistical model-based bug localization

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Lightweight bug localization with AMPLE

Proceedings of the sixth international symposium on Automated analysis-driven debugging
Empirical evaluation of the tarantula automatic fault-localization technique

Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
Semantic clustering: Identifying topics in source code

Information and Software Technology
Extraction of bug localization benchmarks from history

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
An empirical analysis of information retrieval based concept location techniques in software comprehension

Empirical Software Engineering
Mining source code to automatically split identifiers for software analysis

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories

A static technique for fault localization using character n-gram based information retrieval model

Proceedings of the 5th India Software Engineering Conference
Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports

Proceedings of the 34th International Conference on Software Engineering
Identifying Linux bug fixing patches

Proceedings of the 34th International Conference on Software Engineering
Is text search an effective approach for fault localization: a practitioners perspective

Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
A hybrid bug triage algorithm for developer recommendation

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Assisting code search with automatic query reformulation for bug localization

Proceedings of the 10th Working Conference on Mining Software Repositories
The MSR cookbook: mining a decade of research

Proceedings of the 10th Working Conference on Mining Software Repositories
Leveraging machine learning and information retrieval techniques in software evolution tasks: summary of the first MALIR-SE workshop, at ASE 2013

ACM SIGSOFT Software Engineering Notes

Quantified Score

Hi-index	0.00

Visualization

Abstract

From the standpoint of retrieval from large software libraries for the purpose of bug localization, we compare five generic text models and certain composite variations thereof. The generic models are: the Unigram Model (UM), the Vector Space Model (VSM), the Latent Semantic Analysis Model (LSA), the Latent Dirichlet Allocation Model (LDA), and the Cluster Based Document Model (CBDM). The task is to locate the files that are relevant to a bug reported in the form of a textual description by a software developer. We use for our study iBUGS, a benchmarked bug localization dataset with 75 KLOC and a large number of bugs (291). A major conclusion of our comparative study is that simple text models such as UM and VSM are more effective at correctly retrieving the relevant files from a library as compared to the more sophisticated models such as LDA. The retrieval effectiveness for the various models was measured using the following two metrics: (1) Mean Average Precision; and (2) Rank-based metrics. Using the SCORE metric, we also compare the retrieval effectiveness of the models in our study with some other bug localization tools.