Scalable and systematic detection of buggy inconsistencies in source code

  • Authors:
  • Mark Gabel;Junfeng Yang;Yuan Yu;Moises Goldszmidt;Zhendong Su

  • Affiliations:
  • University of California at Davis, Davis, CA, USA;Columbia University, New York, NY, USA;Microsoft Research, Silicon Valley, Mountain View, CA, USA;Microsoft Research, Silicon Valley, Mountain View, CA, USA;University of California at Davis, Davis, CA, USA

  • Venue:
  • Proceedings of the ACM international conference on Object oriented programming systems languages and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.