A bridging model for parallel computation
Communications of the ACM
Partitioned posting files: a parallel inverted file structure for information retrieval
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel text searching in serial files using a processor farm
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
General purpose parallel architectures
Handbook of theoretical computer science (vol. A)
Information retrieval
Incremental clustering for dynamic information processing
ACM Transactions on Information Systems (TOIS)
Scalable parallel geometric algorithms for coarse grained multicomputers
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Dynamic clustering for time incremental data
Pattern Recognition Letters
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Parallel text retrieval on a high performance supercomputer using the Vector Space Model
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A randomized parallel 3D convex hull algorithm for coarse grained multicomputers
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Communication primitive for BSP computers
Information Processing Letters
Efficient external memory algorithms by simulating coarse-grained parallel algorithms
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Partial replica selection based on relevance for information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
ACM Computing Surveys (CSUR)
Communication-Efficient Parallel Sorting
SIAM Journal on Computing
A vector space model for automatic indexing
Communications of the ACM
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Information Retrieval
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Information Retrieval on an SCI-Based PC Cluster
The Journal of Supercomputing
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
The Heterogeneous Bulk Synchronous Parallel Model
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A general-purpose model for heterogeneous computation
A general-purpose model for heterogeneous computation
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency and effectiveness of query processing in cluster-based retrieval
Information Systems
Algorithm Design
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
Introduction to Information Retrieval
Introduction to Information Retrieval
SOPHIA: an interactive cluster-based retrieval system for the OHSUMED collection
IEEE Transactions on Information Technology in Biomedicine
Hi-index | 0.00 |
In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval (IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk capacity for holding their local data.