Cluster: A Fast Tool to Identify Groups of Similar Programs

Authors:
Casey Carter;Nicholas Tran
Affiliations:
-;-
Venue:
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Year:
2002

Citing 3
Cited 0

Sim: a utility for detecting similarity in computer programs

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
A fast algorithm for computing longest common subsequences

Communications of the ACM
A linear space algorithm for computing maximal common subsequences

Communications of the ACM

Quantified Score

Hi-index	0.00

Visualization

Abstract

cluster is a tool to partition a large pool of C programs into groups according to structural similarity. Its method involves calculating an alignment score for each program against a mosaic made of randomly selected code fragments of fixed size from the pool. The scores are then grouped together so that the distance between two adjacent members of a group is at most some threshold value. cluster is effective in identifying tight clusters of similar programs and is capable of distributing its workload over a network of workstations to achieve very fast running times. As a tool, cluster is highly configurable: the user can adjust its alignment scoring scheme and clustering threshold as well as obtain visual alignments of programs suspected to be similar.