CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Evaluating Clone Detection Tools for Use during Preventative Maintenance
SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Problems Creating Task-relevant Clone Detection Reference Data
WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Suffix Trees
WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
The Description Logic Handbook
The Description Logic Handbook
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
ICSE '07 Proceedings of the 29th international conference on Software Engineering
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Scalable Distributed Reasoning Using MapReduce
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Clone detection by exploiting assembler
Proceedings of the 4th International Workshop on Software Clones
Proceedings of the 4th International Workshop on Software Clones
Index-based code clone detection: incremental, distributed, scalable
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
CEL: a polynomial-time reasoner for life science ontologies
IJCAR'06 Proceedings of the Third international joint conference on Automated Reasoning
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
Hi-index | 0.00 |
The semantic web is slowly transforming the web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source-code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publically accessible repositories and the introduction of massively horizontally scaling frameworks and cloud computing infrastructure, a new era of software mining across information silos is reshaping the software engineering landscape. The so far unreachable goal of analyzing code at a global level, and therefore detecting global software clones, has become manageable. Description logic and semantic web reasoners have so far only plaid a minor role in this transformation and are mainly used to model source code data. In this paper, we introduce a clone detection algorithm that uses a semantic web reasoner and is based on the Hadoop map-reduce framework that can scale horizontally to a large amount of data. We also define a novel and compact clone model that only considers control-blocks and used data types while still yielding similar clone detection results than more complex representations. In order to validate our approach we have compared our algorithm to some of the leading clone detection tools (CCFinder, JCD and Simian) and show differences in performance and detection precision.