An information retrieval process to aid in the analysis of code clones

Authors:
Robert Tairas;Jeff Gray
Affiliations:
Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, USA 35294;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, USA 35294
Venue:
Empirical Software Engineering
Year:
2009

Citing 18
Cited 5

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Visual Detection of Duplicated Code

ECOOP '98 Workshop ion on Object-Oriented Technology
Measuring Clone Based Reengineering Opportunities

METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Identification of High-Level Concept Clones in Source Code

Proceedings of the 16th IEEE international conference on Automated software engineering
Aiding Comprehension of Cloning Through Categorization

IWPSE '04 Proceedings of the Principles of Software Evolution, 7th International Workshop
Insights into System-Wide Code Duplication

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server(TM) 2003, Windows XP, and Windows 2000 (Pro-Developer)

Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server(TM) 2003, Windows XP, and Windows 2000 (Pro-Developer)
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Visualization of clone detection results

eclipse '06 Proceedings of the 2006 OOPSLA workshop on eclipse technology eXchange
Semantic clustering: Identifying topics in source code

Information and Software Technology
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Analysis of the Linux Kernel Evolution Using Code Clone Coverage

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
A Framework for Studying Clones In Large Software Systems

SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
A Framework for Studying Clones In Large Software Systems

SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering

Clone maintenance through analysis and refactoring

Proceedings of the 2008 Foundations of Software Engineering Doctoral Symposium
Using structural and textual information to capture feature coupling in object-oriented software

Empirical Software Engineering
Concept location using formal concept analysis and information retrieval

ACM Transactions on Software Engineering and Methodology (TOSEM)
Risk chain prediction metrics for predicting fault proneness in object oriented systems

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advent of new static analysis tools has automated the searching for code clones, which are duplicated or similar code fragments in a program. However, clone detection tools can report many clones if the source code that is being searched is large. Programmers may have difficulty comprehending the extensive results from the detection tool, which may inhibit the ability to maintain the identified clones. Latent Semantic Indexing (LSI) is an information retrieval technique that attempts to find relationships in a corpus based on the analysis of the documents in the corpus and the terms in the documents. In this paper, LSI is used to cluster clone classes that have been identified initially by a clone detection tool. The goal of this paper is to detect trends and associations among the clustered clone classes and determine if they provide further comprehension to assist in the maintenance of clones. Experimental evaluation of the approach is reported from a sequence of tools that are chained together to perform an analysis of clones detected in the Microsoft Windows NT kernel source code.