Bug localization using latent Dirichlet allocation

Authors:
Stacy K. Lukins;Nicholas A. Kraft;Letha H. Etzkorn
Affiliations:
Computer Science Department, University of Alabama in Huntsville, Huntsville, AL 35899, USA;Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487-0290, USA;Computer Science Department, University of Alabama in Huntsville, Huntsville, AL 35899, USA
Venue:
Information and Software Technology
Year:
2010

Citing 23
Cited 17

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
An empirical study of system design instability metric and design evolution in an agile software process

Journal of Systems and Software
Enriching Reverse Engineering with Semantic Clustering

WCRE '05 Proceedings of the 12th Working Conference on Reverse Engineering
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing design instability in iterative (agile) object-oriented projects: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice
Essential Dimensions of Latent Semantic Indexing (LSI)

HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes

IEEE Transactions on Software Engineering
Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval

IEEE Transactions on Software Engineering
Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code

ICPC '07 Proceedings of the 15th IEEE International Conference on Program Comprehension
Mining concepts from code with probabilistic topic models

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
Mining business topics in source code using latent dirichlet allocation

ISEC '08 Proceedings of the 1st India software engineering conference
Do Crosscutting Concerns Cause Defects?

IEEE Transactions on Software Engineering
Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
Is it a bug or an enhancement?: a text-based approach to classify change requests

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
An examination of stability and reusability in highly iterative software

An examination of stability and reusability in highly iterative software
Source code retrieval for bug localization using latent dirichlet allocation, and its relationship to stability of agilely developed software

Source code retrieval for bug localization using latent dirichlet allocation, and its relationship to stability of agilely developed software
The TAME project: towards improvement-oriented software environments

IEEE Transactions on Software Engineering

Modeling the evolution of topics in source code histories

Proceedings of the 8th Working Conference on Mining Software Repositories
Recovering traceability links between source code and fixed bugs via patch analysis

Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering
Quantifying the similiarities between source code lexicons

Proceedings of the 49th Annual Southeast Regional Conference
The effects of identifier retention and stop word removal on a latent Dirichlet allocation based feature location technique

Proceedings of the 50th Annual Southeast Regional Conference
Automatically detecting the quality of the query and its implications in IR-based concept location

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
A topic-based approach for narrowing the search space of buggy files from a bug report

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports

Proceedings of the 34th International Conference on Software Engineering
Combining lexical and structural information for static bug localisation

International Journal of Computer Applications in Technology
DRETOM: developer recommendation based on topic models for bug resolution

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Semantic fault diagnosis: automatic natural-language fault descriptions

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Concept-based failure clustering

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

Proceedings of the 2013 International Conference on Software Engineering
Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation

Proceedings of the 10th Working Conference on Mining Software Repositories
Automatically describing software faults

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Using topic models to understand the evolution of a software ecosystem

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Leveraging machine learning and information retrieval techniques in software evolution tasks: summary of the first MALIR-SE workshop, at ASE 2013

ACM SIGSOFT Software Engineering Notes
Static test case prioritization using topic models

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.