A pipelined architecture for distributed text query evaluation

Authors:
Alistair Moffat;William Webber;Justin Zobel;Ricardo Baeza-Yates
Affiliations:
Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia 3010;Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia 3010 and School of Computer Science and Information Technology, RMIT University, Melbourn ...;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia 3001;Center for Web Research, Department of Computer Science, University of Chile, Santiago, Chile and Yahoo! Research, Barcelona, Spain
Venue:
Information Retrieval
Year:
2007

Citing 27
Cited 39

Prototyping a distributed information retrieval system that uses statistical ranking

Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Efficiency/effectiveness trade-offs in query processing (from theory into practice workshop, 1998 SIGIR conf.)

ACM SIGIR Forum
Efficient distributed algorithms to build inverted files

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid Partition Inverted Files: Experimental Validation

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Parallel Search using Partitioned Inverted Files

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Methodologies for Distributed Information Retrieval

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Parallel inverted index for large-scale, dynamic digital libraries

Parallel inverted index for large-scale, dynamic digital libraries
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
A reliable storage management layer for distributed information retrieval systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Fast on-line index construction by geometric partitioning

Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Space-Limited ranked query evaluation using adaptive pruning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

High-performance distributed inverted files

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Dynamic index pruning for effective caching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Mining query logs to optimize index partitioning in parallel web search engines

Proceedings of the 2nd international conference on Scalable information systems
Query-driven indexing for scalable peer-to-peer text retrieval

Future Generation Computer Systems
Exploiting Hybrid Parallelism in Web Search Engines

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Improving Search Engines Performance on Multithreading Processors

High Performance Computing for Computational Science - VECPAR 2008
Two-Dimensional Distributed Inverted Files

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On the feasibility of multi-site web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)
Sync/Async parallel search for the efficient design and construction of web search engines

Parallel Computing
Performance comparison of clustered and replicated information retrieval systems

ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
New caching techniques for web search engines

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Building efficient multi-threaded search nodes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Load and storage balanced posting file partitioning for parallel information retrieval

Journal of Systems and Software
On-line multi-threaded processing of web user-clicks on multi-core processors

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A combined semi-pipelined query processing architecture for distributed full-text retrieval

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Efficient compressed inverted index skipping for disjunctive text-queries

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Posting list intersection on multicore architectures

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An evaluation of fault-tolerant query processing for web search engines

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Replicated partitioning for undirected hypergraphs

Journal of Parallel and Distributed Computing
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Scalable search platform: improving pipelined query processing for distributed full-text retrieval

Proceedings of the 21st international conference companion on World Wide Web
Intra-query concurrent pipelined processing for distributed full-text retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Learning to predict response times for online query scheduling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Capacity planning for vertical search engines: an approach based on coloured petri nets

PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
A search engine accepting on-line updates

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
(Sync|Async)+ MPI search engines

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Shard ranking and cutoff estimation for topically partitioned collections

Proceedings of the 21st ACM international conference on Information and knowledge management
3D inverted index with cache sharing for web search engines

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Improving the performance of pipelined query processing with skipping

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Maguro, a system for indexing and searching over very large text collections

Proceedings of the sixth ACM international conference on Web search and data mining
Hybrid query scheduling for a replicated search engine

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Load-sensitive selective pruning for distributed search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A term-based inverted index partitioning model for efficient distributed query processing

ACM Transactions on the Web (TWEB)
Modelling Search Engines Performance Using Coloured Petri Nets

Fundamenta Informaticae - Application and Theory of Petri Nets and Concurrency, 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two principal query-evaluation methodologies have been described for cluster-based implementation of distributed information retrieval systems: document partitioning and term partitioning. In a document-partitioned system, each of the processors hosts a subset of the documents in the collection, and executes every query against its local sub-collection. In a term-partitioned system, each of the processors hosts a subset of the inverted lists that make up the index of the collection, and serves them to a central machine as they are required for query evaluation.In this paper we introduce a pipelined query-evaluation methodology, based on a term-partitioned index, in which partially evaluated queries are passed amongst the set of processors that host the query terms. This arrangement retains the disk read benefits of term partitioning, but more effectively shares the computational load. We compare the three methodologies experimentally, and show that term distribution is inefficient and scales poorly. The new pipelined approach offers efficient memory utilization and efficient use of disk accesses, but suffers from problems with load balancing between nodes. Until these problems are resolved, document partitioning remains the preferred method.