A static technique for fault localization using character n-gram based information retrieval model

Authors:
Sangeeta Lal;Ashish Sureka
Affiliations:
Indraprastha Institute of Information Technology, (IIIT-Delhi), India;Indraprastha Institute of Information Technology, (IIIT-Delhi), India
Venue:
Proceedings of the 5th India Software Engineering Conference
Year:
2012

Citing 11
Cited 0

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Identifying the Starting Impact Set of a Maintenance Request: A Case Study

CSMR '00 Proceedings of the Conference on Software Maintenance and Reengineering
Static Techniques for Concept Location in Object-Oriented Code

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
Locating causes of program failures

Proceedings of the 27th international conference on Software engineering
Language and task independent text categorization with simple language models

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Impact Analysis by Mining Software and Change Request Repositories

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Empirical evaluation of the tarantula automatic fault-localization technique

Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
Augmented bug localization using past bug information

Proceedings of the 48th Annual Southeast Regional Conference
Retrieval from software libraries for bug localization: a comparative study of generic and composite text models

Proceedings of the 8th Working Conference on Mining Software Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bug or Fault localization is a process of identifying the specific location(s) or region(s) of source code (at various granularity levels such as the directory path, file, method or statement) that is faulty and needs to be modified to repair the defect. Fault localization is a routine task in software maintenance (corrective maintenance). Due to the increasing size and complexity of current software applications, automated solutions for fault localization can significantly reduce human effort and software maintenance cost. We present a technique (which falls into the class of static techniques for bug localization) for fault localization based on a character n-gram based Information Retrieval (IR) model. We frame the problem of bug localization as a relevant document(s) search task for a given query and investigate the application of character-level n-gram based textual features derived from bug reports and source-code file attributes. We implement the proposed IR model and evaluate its performance on dataset downloaded from two popular open-source projects (JBoss and Apache). We conduct a series of experiments to validate our hypothesis and present evidences to demonstrate that the proposed approach is effective. The accuracy of the proposed approach is measured in terms of the standard and commonly used SCORE and MAP (Mean Average Precision) metrics for the task of fault localization. Experimental results reveal that the median value for the SCORE metric for JBoss and Apache dataset is 99.03% and 93.70% respectively. We observe that for 16.16% of the bug reports in the JBoss dataset and for 10.67% of the bug reports in the Apache dataset, the average precision value (computed at all recall levels) is between 0.9 and 1.0.