Sync/Async parallel search for the efficient design and construction of web search engines

Authors:
Mauricio Marin;Veronica Gil-Costa;Carolina Bonacic;Ricardo Baeza-Yates;Isaac D. Scherson
Affiliations:
Yahoo! Research Latin America, Santiago, Chile and Informatic Engineering Department, University of Santiago of Chile, Chile;Yahoo! Research Latin America, Santiago, Chile and Informatic Department, National University of San Luis, Argentina;Computer Architecture Department, Complutense University of Madrid, Spain;Yahoo! Research Latin America, Santiago, Chile;Department of Computer Science, University of California, Irvine, CA 92697, United States
Venue:
Parallel Computing
Year:
2010

Citing 22
Cited 7

A bridging model for parallel computation

Communications of the ACM
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Distributed Processing of Similarity Queries

Distributed and Parallel Databases
Searching in metric spaces

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Hybrid Partition Inverted Files: Experimental Validation

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Parallel Search using Partitioned Inverted Files

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
A pipelined architecture for distributed text query evaluation

Information Retrieval
Heavy-tailed distributions and multi-keyword queries

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
High-performance distributed inverted files

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The Case for Energy-Proportional Computing

Computer
Distributed Sparse Spatial Selection Indexes

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Mining query logs to optimize index partitioning in parallel web search engines

Proceedings of the 2nd international conference on Scalable information systems
Parallel query processing on distributed clustering indexes

Journal of Discrete Algorithms
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)
Efficient parallelization of spatial approximation trees

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I

Building efficient multi-threaded search nodes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A combined semi-pipelined query processing architecture for distributed full-text retrieval

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
An evaluation of fault-tolerant query processing for web search engines

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Intra-query concurrent pipelined processing for distributed full-text retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Efficient parallel block-max WAND algorithm

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Range query processing on single and multi GPU environments

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A parallel query processing method is proposed for the design and construction of web search engines to efficiently deal with dynamic variations in query traffic. The method allows for the efficient use of different distributed indexing and query processing strategies in server clusters consisting of multiple computational/storage nodes. It also enables a better utilization of local and distributed hardware resources as it automatically re-organizes parallel computations to benefit from the advantages of two mixed modes of operation, namely: a newly proposed synchronous mode and the standard asynchronous computing mode. Switching between modes is facilitated by a round-robin strategy devised to grant each query a fair share of the hardware resources and properly predict query throughput. Performance is evaluated by experimental methods and two case studies serve to show how to develop efficient parallel query processing algorithms for large-scale search engines based on the proposed paradigm.