Performance and Architectural Issues for String Matching

Authors:
Merrill E. Iseman;Dennis E. Shasha
Affiliations:
AT&T Bell Labs, Holmdel, NJ;New York Univ., New York
Venue:
IEEE Transactions on Computers
Year:
1990

Citing 14
Cited 4

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
Access methods for text

ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
Parallel free-text search on the connection machine system

Communications of the ACM - Special issue on parallelism
The C programming language

The C programming language
The C++ programming language (2nd ed.)

The C++ programming language (2nd ed.)
Operational characteristics of a harware-based pattern matcher

ACM Transactions on Database Systems (TODS)
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm

Communications of the ACM
Implementation of the substring test by hashing

Communications of the ACM
Advanced Database Machine Architecture

Advanced Database Machine Architecture
The Fast Data Finder - An Architecture for Very High Speed Data Search and Dissemination

Proceedings of the First International Conference on Data Engineering
An associative/parallel processor for partial match retrieval using superimposed codes

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Associative/parallel processors for searching very large textual data bases

CAW '77 Proceedings of the 3rd workshop on Computer architecture : Non-numeric processing
Hardware for searching very large text databases

Hardware for searching very large text databases

A recursive MISD architecture for pattern matching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Processor Array Architectures for Deep Packet Classification

IEEE Transactions on Parallel and Distributed Systems
A neural network string matcher

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
A bibliography on computational molecular biology and genetics

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	14.98

Visualization

Abstract

The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares'.