Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance

  • Authors:
  • Brenda S. Baker

  • Affiliations:
  • -

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings and parameterized matches of parameterized strings. A data structure called a parameterized suffix tree is defined to aid in searching for parameterized matches. For fixed alphabets, algorithms are given to construct a parameterized suffix tree in linear time and to find all maximal parameterized matches over a threshold length in a parameterized p-string in time linear in the size of the input plus the number of matches reported. The algorithms have been implemented, and experimental results show that they perform well on C code.