Identification of High-Level Concept Clones in Source Code

Authors:
Andrian Marcus;Jonathan I. Maletic
Affiliations:
-;-
Venue:
Proceedings of the 16th IEEE international conference on Automated software engineering
Year:
2001

Citing 32
Cited 35

System Structure Analysis: Clustering with Data Bindings

IEEE Transactions on Software Engineering - Annals of discrete mathematics, 24
Full text indexing based on lexical relations an application: software libraries

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Using linear algebra for intelligent information retrieval

SIAM Review
A survey of information retrieval and filtering methods

A survey of information retrieval and filtering methods
An intelligent tool for re-engineering software modularity

ICSE '91 Proceedings of the 13th international conference on Software engineering
Extracting concepts from file names: a new file clustering criterion

Proceedings of the 20th international conference on Software engineering
Assessing software libraries by browsing similar classes, functions and relationships

Proceedings of the 21st international conference on Software engineering
Removing clones from the code

Journal of Software Maintenance: Research and Practice
A comparison of abstract data types and objects recovery techniques

Science of Computer Programming - Special issue on WCRE 97
Supporting program comprehension using semantic and structural information

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
A Metric-Based Approach to Detect Abstract Data Types and State Encapsulations

Automated Software Engineering
Automatically Identifying Reusable OO Legacy Code

Computer
Tool Demonstration: Finding Duplicated Code Using Program Dependences

ESOP '01 Proceedings of the 10th European Symposium on Programming Languages and Systems
Substring Matching for Clone Detection and Change Tracking

ICSM '94 Proceedings of the International Conference on Software Maintenance
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Visual Detection of Duplicated Code

ECOOP '98 Workshop ion on Object-Oriented Technology
Specification-based Browsing of Software Component Libraries

ASE '98 Proceedings of the 13th IEEE international conference on Automated software engineering
Automatic Software Clustering via Latent Semantic Analysis

ASE '99 Proceedings of the 14th IEEE international conference on Automated software engineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Comparison of Abstract Data Type and Abstract State Encapsulation Detection Techniques for Architectural Understanding

WCRE '97 Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE '97)
Using Clustering Algorithms in Legacy Systems Remodularization

WCRE '97 Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE '97)
Evaluation Experiments on the Detection of Programming Patterns Using Software Metrics

WCRE '97 Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE '97)
Experiments with Clustering as a Software Remodularization Method

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Partial Redesign of Java Software Systems Based on Clone Analysis

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Using Automatic Clustering to Produce High-Level System Organizations of Source Code

IWPC '98 Proceedings of the 6th International Workshop on Program Comprehension
Using latent semantic analysis to identify similarities in source code to support program understanding

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence

Supporting document and data views of source code

Proceedings of the 2002 ACM symposium on Document engineering
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
Hybridizing evolutionary algorithms and clustering algorithms to find source-code clones

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering
Semantic clustering: Identifying topics in source code

Information and Software Technology
Source Code Analysis: A Road Map

FOSE '07 2007 Future of Software Engineering
Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval

IEEE Transactions on Software Engineering
An empirical study of rules for well-formed identifiers: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice - Source Code Analysis and Manipulation (SCAM 2006)
Quantifying identifier quality: an analysis of trends

Empirical Software Engineering
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Feature location via information retrieval based filtering of a single scenario execution trace

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Using information retrieval to support design of incremental change of software

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Mining business topics in source code using latent dirichlet allocation

ISEC '08 Proceedings of the 1st India software engineering conference
Empirical evaluation of clone detection using syntax suffix trees

Empirical Software Engineering
Using information retrieval based coupling measures for impact analysis

Empirical Software Engineering
An information retrieval process to aid in the analysis of code clones

Empirical Software Engineering
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
A study of comment abstraction, coupling, and placement

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Parsing formal languages using natural language parsing techniques

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Software Cartography: thematic software visualization with consistent layout

Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Recommending source code examples via API call usages and documentation

Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
Automated type-3 clone oracle using Levenshtein metric

Proceedings of the 5th International Workshop on Software Clones
Applying a dynamic threshold to improve cluster detection of LSI

Science of Computer Programming
Using structural and textual information to capture feature coupling in object-oriented software

Empirical Software Engineering
An investigation of cloning in web applications

ICWE'05 Proceedings of the 5th international conference on Web Engineering
An empirical study on inconsistent changes to code clones at the release level

Science of Computer Programming
Recommending library methods: an evaluation of the vector space model (VSM) and latent semantic indexing (LSI)

ICSR'06 Proceedings of the 9th international conference on Reuse of Off-the-Shelf Components
Analyzing and mining a code search engine usage log

Empirical Software Engineering
Combining lexical and structural information for static bug localisation

International Journal of Computer Applications in Technology
Concept location using formal concept analysis and information retrieval

ACM Transactions on Software Engineering and Methodology (TOSEM)
Automatic recognition of students' sorting algorithm implementations in a data structures and algorithms course

Proceedings of the 12th Koli Calling International Conference on Computing Education Research
Generalized vulnerability extrapolation using abstract syntax trees

Proceedings of the 28th Annual Computer Security Applications Conference
Applying a smoothing filter to improve IR-based traceability recovery processes: An empirical investigation

Information and Software Technology
A fuzzy model for high-level clones in software

ACM SIGSOFT Software Engineering Notes
Recovering test-to-code traceability using slicing and textual analysis

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Source code duplication occurs frequently within largesoftware systems. Pieces of source code, functions, anddata types are often duplicated in part, or in whole, for avariety of reasons. Programmers may simply be reusinga piece of code via copy and paste or they may be "re-inventingthe wheel".Previous research on the detection of clones is mainlyfocused on identifying pieces of code with similar (ornearly similar) structure. Our approach is to examine thesource code text (comments and identifiers) and identifyimplementations of similar high-level concepts (e.g.,abstract data types). The approach uses an informationretrieval technique (i.e., latent semantic indexing) tostatically analyze the software system and determinesemantic similarities between source code documents(i.e., functions, files, or code segments). These similaritymeasures are used to drive the clone detection process.The intention of our approach is to enhance andaugment existing clone detection methods that are basedon structural analysis. This synergistic use of methodswill improve the quality of clone detection. A set ofexperiments is presented that demonstrate the usage ofsemantic similarity measure to identify clones within aversion of NCSA Mosaic.