Cluster: A Fast Tool to Identify Groups of Similar Programs

  • Authors:
  • Casey Carter;Nicholas Tran

  • Affiliations:
  • -;-

  • Venue:
  • COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

cluster is a tool to partition a large pool of C programs into groups according to structural similarity. Its method involves calculating an alignment score for each program against a mosaic made of randomly selected code fragments of fixed size from the pool. The scores are then grouped together so that the distance between two adjacent members of a group is at most some threshold value. cluster is effective in identifying tight clusters of similar programs and is capable of distributing its workload over a network of workstations to achieve very fast running times. As a tool, cluster is highly configurable: the user can adjust its alignment scoring scheme and clustering threshold as well as obtain visual alignments of programs suspected to be similar.