Practical language-independent detection of near-miss clones

Authors:
James R. Cordy;Thomas R. Dean;Nikita Synytskyy
Affiliations:
Queen's University;Queen's University;University of Waterloo
Venue:
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Year:
2004

Citing 12
Cited 8

Pattern matching for clone and concept detection

Reverse engineering
A fast algorithm for computing longest common subsequences

Communications of the ACM
Hypertext: The Next Maintenance Mountain

Computer
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Reverse Engineering to Achieve Maintainable WWW Sites

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Agile Parsing in TXL

Automated Software Engineering
Robust multilingual parsing using island grammars

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Code compaction of matching single-entry multiple-exit regions

SAS'03 Proceedings of the 10th international conference on Static analysis

STAC: software tuning panels for autonomic control

CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Automated conversion of table-based websites to structured stylesheets using table recognition and clone detection

CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Empirical evaluation of clone detection using syntax suffix trees

Empirical Software Engineering
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Agile parsing to transform web applications

GTTSE'05 Proceedings of the 2005 international conference on Generative and Transformational Techniques in Software Engineering
SPAPE: A semantic-preserving amorphous procedure extraction method for near-miss clones

Journal of Systems and Software
Tuning research tools for scalability and performance: The NiCad experience

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous research shows that most software systems contain significant amounts of duplicated, or cloned, code. Some clones are exact duplicates of each other, while others differ in small details only. We designate these almost-perfect clones as "near-miss" clones. While technically difficult, detection of near-miss clones has many benefits, both academic and practical. Finding these clones can give us better insight into the way developers maintain and reuse code, and we can also parameterize and remove near-miss clones to reduce overall source code size and decrease system complexity. This paper presents a simple, general and practical way to detect near-miss clones, and summarizes the results of its application to two production websites. We use standard lexical comparison tools coupled with language-specific extractors to locate potential clones. Our approach separates code comparisons from code understanding, and makes the comparisons language independent. This makes it easy to adapt to different programming languages.