Associative/parallel processors for searching very large textual data bases

Authors:
R. M. Bird;J. C. Tu;R. M. Worthy
Affiliations:
Operating Systems, Inc., Woodland Hills, CA;Operating Systems, Inc., Woodland Hills, CA;Operating Systems, Inc., Woodland Hills, CA
Venue:
CAW '77 Proceedings of the 3rd workshop on Computer architecture : Non-numeric processing
Year:
1977

Citing 0
Cited 20

Access methods for text

ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
A new string search hardware architecture for VLSI

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Description and performance analysis of signature file methods for office filing

ACM Transactions on Information Systems (TOIS)
Active memory for text information retrieval

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Performance and Architectural Issues for String Matching

IEEE Transactions on Computers
Signature files: design and performance comparison of some signature extraction methods

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Specialized merge processor networks for combining sorted lists

ACM Transactions on Database Systems (TODS)
The design of a hardware recognizer for utilization in scanning operations

CSC '85 Proceedings of the 1985 ACM thirteenth annual conference on Computer Science
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
Hardware systems for text information retrieval

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Comparative analysis of hardware versus software text search

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
A backend machine architecture for information retrieval

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Database filters

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
An associative/parallel processor for partial match retrieval using superimposed codes

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Hardware for searching very large text databases

CAW '80 Proceedings of the fifth workshop on Computer architecture for non-numeric processing
Hardware algorithms for nonnumeric computation

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Rotating memory processors for the matching of complex textual patterns

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Data base machines

ACM SIGIR Forum
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Highly parallel associative search and its application to cellular database machine design

AFIPS '81 Proceedings of the May 4-7, 1981, national computer conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an approach to solving a major problem in the information processing sciences— that of searching very large (5-50 billion characters) data bases of unstructured free-text for random queries within a reasonable time and at an affordable price. The need by information specialists and knowledge workers for large, fast low-cost text and document retrieval systems is growing rapidly. Conventional approaches to the problem have usually depended upon expensive, general purpose computers, upon special pre-preprocessing of the textual data (e.g. file inverting, indexing, abstracting, etc.), and upon elaborate, costly software. The resulting retrieval systems often cost hundreds of dollars per query and the full scanning of an uninverted, unstructured billion byte textual data base could take hours of computer services. However, in spite of these restrictions, such full text search systems have proved useful and even indispensible for many applications. Computer technology of the late 1960's and the 1970's, in both hardware and software (e.g., minicomputers, low-cost, high density disk storage, “chip” electronics, natural language query systems, etc.), have made i t practical to build special purpose, low-cost text retrieval systems. Such a system has been built, tested, and is now in a production stage. The system called the Associative File Processor (AFP), utilizes a conventional minicomputer (DEC's PDP-11/45) for control, off-the-shelf high density disks for storage, a special purpose parallel search module as a text term detector, and query and retrieval software. The AFP is currently being field tested at two sites. Full text, parallel searches on un-preprocessed textual data bases are being performed at the effective matching rates of 4 billion bytes per second (8K byte key memory times 500 Kbyte/second data stream). Estimated costs are 10 to 25 cents per query for a one billion byte data base. The costs per query and the time for searching increase in a linear fashion as data base increases. A basic architecture for the AFP is described and an implemented version is discussed. A more powerful term detector module is also under development. This system is designed around a finite state automaton algorithm.