A static technique for fault localization using character n-gram based information retrieval model

  • Authors:
  • Sangeeta Lal;Ashish Sureka

  • Affiliations:
  • Indraprastha Institute of Information Technology, (IIIT-Delhi), India;Indraprastha Institute of Information Technology, (IIIT-Delhi), India

  • Venue:
  • Proceedings of the 5th India Software Engineering Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bug or Fault localization is a process of identifying the specific location(s) or region(s) of source code (at various granularity levels such as the directory path, file, method or statement) that is faulty and needs to be modified to repair the defect. Fault localization is a routine task in software maintenance (corrective maintenance). Due to the increasing size and complexity of current software applications, automated solutions for fault localization can significantly reduce human effort and software maintenance cost. We present a technique (which falls into the class of static techniques for bug localization) for fault localization based on a character n-gram based Information Retrieval (IR) model. We frame the problem of bug localization as a relevant document(s) search task for a given query and investigate the application of character-level n-gram based textual features derived from bug reports and source-code file attributes. We implement the proposed IR model and evaluate its performance on dataset downloaded from two popular open-source projects (JBoss and Apache). We conduct a series of experiments to validate our hypothesis and present evidences to demonstrate that the proposed approach is effective. The accuracy of the proposed approach is measured in terms of the standard and commonly used SCORE and MAP (Mean Average Precision) metrics for the task of fault localization. Experimental results reveal that the median value for the SCORE metric for JBoss and Apache dataset is 99.03% and 93.70% respectively. We observe that for 16.16% of the bug reports in the JBoss dataset and for 10.67% of the bug reports in the Apache dataset, the average precision value (computed at all recall levels) is between 0.9 and 1.0.