Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
Communications of the ACM
Implementing ranking strategies using text signatures
ACM Transactions on Information Systems (TOIS)
A parallel indexed algorithm for information retrieval
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval using parallel signature files
Data Engineering
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
A storage and access manager for ill-structured data
Communications of the ACM
On the allocation of documents in multiprocessor information retrieval systems
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Caching and database scaling in distributed shared-nothing information retrieval systems
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Parallel text retrieval on a high performance supercomputer using the Vector Space Model
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query processing and inverted indices in shared: nothing text document information retrieval systems
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Multiprocessor Document Allocation: A Genetic Algorithm Approach
IEEE Transactions on Knowledge and Data Engineering
Parallel Processing of Multiple Text Queries on Hypercube Interconnection Networks
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Low-Cost Parallel Text Retrieval Using PC-Cluster
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient in-memory extensible inverted file
Information Systems
The Journal of Supercomputing
Two-Dimensional Distributed Inverted Files
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Scalable online index construction with multi-core CPUs
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Towards very large scale digital library building in greenstone using parallel processing
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Parallel approaches to permutation-based indexing using inverted files
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
(Sync|Async)+ MPI search engines
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Distributed media indexing based on MPI and MapReduce
Multimedia Tools and Applications
Hi-index | 0.02 |
This paper describes algorithms and data structures for applying a parallel computer to information retrieval. Previous work has described an implementation based on overlap encoded signatures. That system was limited by 1) the necessity of keeping the signatures in primary memory, and 2) the difficulties involved in implementing document-term weighting. Overcoming these limitations requires adapting the inverted index techniques used on serial machines. The most obvious adaptation, also previously described, suffers from the fact that data must be sent between processors at query-time. Since interprocessor communication is generally slower than local computation, this suggests that an algorithm which does not perform such communication might be faster. This paper presents a data structure, called a partitioned posting file, in which the interprocessor communication takes place at database-construction time, so that no data movement is needed at query-time. Algorithms for constructing the data structure are also described. Performance characteristics and storage overhead are established by benchmarking against a synthetic database.