Scalable clone detection using description logic

Authors:
Philipp Schugerl
Affiliations:
Concordia University, Montreal, PQ, Canada
Venue:
Proceedings of the 5th International Workshop on Software Clones
Year:
2011

Citing 18
Cited 0

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Evaluating Clone Detection Tools for Use during Preventative Maintenance

SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Problems Creating Task-relevant Clone Detection Reference Data

WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Suffix Trees

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
The Description Logic Handbook

The Description Logic Handbook
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

ICSE '07 Proceedings of the 29th international conference on Software Engineering
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Clone detection by exploiting assembler

Proceedings of the 4th International Workshop on Software Clones
Quo vadis, clone management?

Proceedings of the 4th International Workshop on Software Clones
Index-based code clone detection: incremental, distributed, scalable

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
CEL: a polynomial-time reasoner for life science ontologies

IJCAR'06 Proceedings of the Third international joint conference on Automated Reasoning
Mega software engineering

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement

Quantified Score

Hi-index	0.00

Visualization

Abstract

The semantic web is slowly transforming the web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source-code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publically accessible repositories and the introduction of massively horizontally scaling frameworks and cloud computing infrastructure, a new era of software mining across information silos is reshaping the software engineering landscape. The so far unreachable goal of analyzing code at a global level, and therefore detecting global software clones, has become manageable. Description logic and semantic web reasoners have so far only plaid a minor role in this transformation and are mainly used to model source code data. In this paper, we introduce a clone detection algorithm that uses a semantic web reasoner and is based on the Hadoop map-reduce framework that can scale horizontally to a large amount of data. We also define a novel and compact clone model that only considers control-blocks and used data types while still yielding similar clone detection results than more complex representations. In order to validate our approach we have compared our algorithm to some of the leading clone detection tools (CCFinder, JCD and Simian) and show differences in performance and detection precision.