A parallel indexed algorithm for information retrieval

Authors:
C. Stanfill;R. Thau;D. Waltz
Affiliations:
Thinking Machines Corporation, 245 First Street, Cambridge MA;Thinking Machines Corporation, 245 First Street, Cambridge MA;Thinking Machines Corporation, 245 First Street, Cambridge MA
Venue:
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1989

Citing 9
Cited 14

The connection machine

The connection machine
Parallel free-text search on the connection machine system

Communications of the ACM - Special issue on parallelism
Parallel Querying of Large Databases: A Case Study

Computer
Parallel text search methods

Communications of the ACM
Implementing ranking strategies using text signatures

ACM Transactions on Information Systems (TOIS)
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
Message files

ACM Transactions on Information Systems (TOIS)
Information Retrieval

Information Retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

Partitioned posting files: a parallel inverted file structure for information retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Posting compression in dynamic retrieval environments

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
On the allocation of documents in multiprocessor information retrieval systems

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Parallel text retrieval on a high performance supercomputer using the Vector Space Model

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient processing of one and two dimensional proximity queries in associative memory

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Low-Cost Parallel Text Retrieval Using PC-Cluster

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Challenges of massive parallelism

IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 1
Massively parallel artificial intelligence

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 1
Massively parallel AI

AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel algorithm for retrieval based on signature files. First, it permits the employment of ranking strategies which cannot be easily implemented using signature files, specifically methods which depend on document-term weighting. Second, it permits the interactive searching of databases resident on secondary storage. The algorithm is evaluated via a mixture of analytic and simulation techniques, with a particular focus on how cost-effectiveness and efficiency change as the size of the database, number of processors, and cost of memory are altered. In particular, we find that if the ratio of the number of processors and/or disks to the size of the database is held constant, then the cost-effectiveness of the resulting system remains constant. Furthermore, for a given size of database, there is a number of processors which optimizes cost-effectiveness. Estimated response times are also presented. Using these methods, it appears that cost-effective interactive access to databases in the 100-1000 GB range can be achieved using current technology.