Using evolution patterns to find duplicated bugs

  • Authors:
  • E. James Whitehead, Jr.;Kai Pan

  • Affiliations:
  • University of California, Santa Cruz;University of California, Santa Cruz

  • Venue:
  • Using evolution patterns to find duplicated bugs
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The widespread use of software configuration management repositories to record the evolution of a software project raises the possibility of mining these repositories to better understand how developers fix software defects. In the repository that records a software project's change history, there are many changes where developers fix bugs (known as bug fix changes) as opposed to adding new features or refactoring source code. Bug fixes are interesting, since they not only provide the source code of a bug, but also the source code for how the bug is fixed. This dissertation defines 27 static bug fix patterns, which are automatically extractable, based on the syntax components and context of the source code involved in bug fix changes. Instances of static bug fix patterns are extracted from the configuration management repositories of seven open source projects, all written in Java (ArgoUML, Columba, Eclipse, JEdit, Scarab, Lucene, and MegaMek). Defined static bug fix patterns cover 45.7% to 63.6% of the total bug fix changes in these projects. Two classes of static bug fix pattern instances, those related to if statements and method calls, together account for 44.6% to 60.3% of all observed static bug fix patterns. Several analyses were performed on the extracted pattern instances on the projects. This dissertation also presents a bug finding algorithm, BugMem, to find duplicated bugs using bug fix memories: a project-specific bug and fix knowledge base developed by analyzing the history of bug fixes. This approach is a learning process, hence identified bug patterns are project-specific, the number of bug patterns grows as the software evolves, and high-level project-specific bugs can be detected. The algorithm and tool were assessed by evaluating if duplicated bugs and fixes in project histories could be found in the bug fix memories. Analysis of five open source projects (ArgoUML, Columba, Eclipse, JEdit, and Scarab) shows that, for these projects, 17.5%--32.4% of bugs appear repeatedly in the memories, and 7.5%--13.0% of bug and fix pairs are found in memories. The results demonstrate that project-specific bug fix patterns occur frequently enough to be useful as a bug detection technique. Furthermore, for the bug and fix pairs, it is possible to both detect the bug and provide a strong suggestion for the fix.