Performance and Architectural Issues for String Matching

  • Authors:
  • Merrill E. Iseman;Dennis E. Shasha

  • Affiliations:
  • AT&T Bell Labs, Holmdel, NJ;New York Univ., New York

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1990

Quantified Score

Hi-index 14.98

Visualization

Abstract

The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares'.