Partitioned posting files: a parallel inverted file structure for information retrieval

Authors:
C. Stanfill
Affiliations:
Thinking Machines Corporation, 245 First Street, Cambridge MA
Venue:
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1989

Citing 7
Cited 20

Parallel free-text search on the connection machine system

Communications of the ACM - Special issue on parallelism
Parallel Querying of Large Databases: A Case Study

Computer
Parallel text search methods

Communications of the ACM
Implementing ranking strategies using text signatures

ACM Transactions on Information Systems (TOIS)
A parallel indexed algorithm for information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval using parallel signature files

Data Engineering
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval

A storage and access manager for ill-structured data

Communications of the ACM
On the allocation of documents in multiprocessor information retrieval systems

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Caching and database scaling in distributed shared-nothing information retrieval systems

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Parallel text retrieval on a high performance supercomputer using the Vector Space Model

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query processing and inverted indices in shared: nothing text document information retrieval systems

The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Multiprocessor Document Allocation: A Genetic Algorithm Approach

IEEE Transactions on Knowledge and Data Engineering
Parallel Processing of Multiple Text Queries on Hypercube Interconnection Networks

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Low-Cost Parallel Text Retrieval Using PC-Cluster

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient in-memory extensible inverted file

Information Systems
Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM)

The Journal of Supercomputing
Two-Dimensional Distributed Inverted Files

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Scalable online index construction with multi-core CPUs

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Towards very large scale digital library building in greenstone using parallel processing

ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Parallel approaches to permutation-based indexing using inverted files

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
(Sync|Async)+ MPI search engines

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy

Parallel Computing
Distributed media indexing based on MPI and MapReduce

Multimedia Tools and Applications

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper describes algorithms and data structures for applying a parallel computer to information retrieval. Previous work has described an implementation based on overlap encoded signatures. That system was limited by 1) the necessity of keeping the signatures in primary memory, and 2) the difficulties involved in implementing document-term weighting. Overcoming these limitations requires adapting the inverted index techniques used on serial machines. The most obvious adaptation, also previously described, suffers from the fact that data must be sent between processors at query-time. Since interprocessor communication is generally slower than local computation, this suggests that an algorithm which does not perform such communication might be faster. This paper presents a data structure, called a partitioned posting file, in which the interprocessor communication takes place at database-construction time, so that no data movement is needed at query-time. Algorithms for constructing the data structure are also described. Performance characteristics and storage overhead are established by benchmarking against a synthetic database.