Scalable and systematic detection of buggy inconsistencies in source code

Authors:
Mark Gabel;Junfeng Yang;Yuan Yu;Moises Goldszmidt;Zhendong Su
Affiliations:
University of California at Davis, Davis, CA, USA;Columbia University, New York, NY, USA;Microsoft Research, Silicon Valley, Mountain View, CA, USA;Microsoft Research, Silicon Valley, Mountain View, CA, USA;University of California at Davis, Davis, CA, USA
Venue:
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Year:
2010

Citing 26
Cited 10

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
LCLint: a tool for using specifications to check code

SIGSOFT '94 Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An empirical study of operating systems errors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A system and language for building system-specific, static analyses

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding bugs is easy

OOPSLA '04 Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Managing Duplicated Code with Linked Editing

VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
"Cloning Considered Harmful" Considered Harmful

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Tracking Code Clones in Evolving Software

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Checking system rules using system-specific, programmer-written compiler extensions

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
An Empirical Study of Function Clones in Open Source Software

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
Do code clones matter?

ICSE '09 Proceedings of the 31st International Conference on Software Engineering

Code clone detection experience at microsoft

Proceedings of the 5th International Workshop on Software Clones
MeCC: memory comparison-based clone detector

Proceedings of the 33rd International Conference on Software Engineering
Understanding modern device drivers

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Understanding and detecting real-world performance bugs

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Characterizing logging practices in open-source software

Proceedings of the 34th International Conference on Software Engineering
CBCD: cloned buggy code detector

Proceedings of the 34th International Conference on Software Engineering
Active refinement of clone anomaly reports

Proceedings of the 34th International Conference on Software Engineering
How do software engineers understand code changes?: an exploratory study in industry

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
XIAO: tuning code clones at hands of engineers in practice

Proceedings of the 28th Annual Computer Security Applications Conference
A source-to-source transformation tool for error fixing

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.