Tree-pattern-based duplicate code detection

Authors:
Hyo-Sub Lee;Kyung-Goo Doh
Affiliations:
Hanyang University, Ansan, South Korea;Hanyang University, Ansan, South Korea
Venue:
Proceedings of the ACM first international workshop on Data-intensive software management and mining
Year:
2009

Citing 10
Cited 0

Identifying syntactic differences between two programs

Software—Practice & Experience
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Clone Detection in Source Code by Frequent Itemset Techniques

SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Clone Detection Using Abstract Syntax Suffix Trees

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Phoenix-based clone detection using suffix trees

Proceedings of the 44th annual Southeast regional conference
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Clone Detection via Structural Abstraction

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a tree-pattern-based method of automatically and accurately finding code clones in program files. Duplicate tree-patterns are first collected by anti-unification algorithm and redundancy-free exhaustive comparisons, and then finally clustered. The algorithm is designed in such a way that the same comparison is not repeated for speed, while thoroughly examining every possible pairs of tree patterns for accuracy. Our method maintains the syntax structure of code in tree-pattern clusters, which gives the flexibility of finding different types of clones while keeping the precision.