Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
ACM Transactions on Information Systems (TOIS)
Searching the Web: the public and their queries
Journal of the American Society for Information Science and Technology
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid Partition Inverted Files: Experimental Validation
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Methodologies for Distributed Information Retrieval
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Parallel inverted index for large-scale, dynamic digital libraries
Parallel inverted index for large-scale, dynamic digital libraries
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
A reliable storage management layer for distributed information retrieval systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Fast on-line index construction by geometric partitioning
Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Space-Limited ranked query evaluation using adaptive pruning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
High-performance distributed inverted files
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Dynamic index pruning for effective caching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Mining query logs to optimize index partitioning in parallel web search engines
Proceedings of the 2nd international conference on Scalable information systems
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Exploiting Hybrid Parallelism in Web Search Engines
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Scheduling Intersection Queries in Term Partitioned Inverted Files
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Improving Search Engines Performance on Multithreading Processors
High Performance Computing for Computational Science - VECPAR 2008
Two-Dimensional Distributed Inverted Files
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
Performance comparison of clustered and replicated information retrieval systems
ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
New caching techniques for web search engines
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Building efficient multi-threaded search nodes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Load and storage balanced posting file partitioning for parallel information retrieval
Journal of Systems and Software
On-line multi-threaded processing of web user-clicks on multi-core processors
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A combined semi-pipelined query processing architecture for distributed full-text retrieval
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Efficient compressed inverted index skipping for disjunctive text-queries
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An evaluation of fault-tolerant query processing for web search engines
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Replicated partitioning for undirected hypergraphs
Journal of Parallel and Distributed Computing
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Scalable search platform: improving pipelined query processing for distributed full-text retrieval
Proceedings of the 21st international conference companion on World Wide Web
Intra-query concurrent pipelined processing for distributed full-text retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Learning to predict response times for online query scheduling
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Capacity planning for vertical search engines: an approach based on coloured petri nets
PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
A search engine accepting on-line updates
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
(Sync|Async)+ MPI search engines
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Shard ranking and cutoff estimation for topically partitioned collections
Proceedings of the 21st ACM international conference on Information and knowledge management
3D inverted index with cache sharing for web search engines
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Improving the performance of pipelined query processing with skipping
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Maguro, a system for indexing and searching over very large text collections
Proceedings of the sixth ACM international conference on Web search and data mining
Hybrid query scheduling for a replicated search engine
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Load-sensitive selective pruning for distributed search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A term-based inverted index partitioning model for efficient distributed query processing
ACM Transactions on the Web (TWEB)
Modelling Search Engines Performance Using Coloured Petri Nets
Fundamenta Informaticae - Application and Theory of Petri Nets and Concurrency, 2012
Hi-index | 0.00 |
Two principal query-evaluation methodologies have been described for cluster-based implementation of distributed information retrieval systems: document partitioning and term partitioning. In a document-partitioned system, each of the processors hosts a subset of the documents in the collection, and executes every query against its local sub-collection. In a term-partitioned system, each of the processors hosts a subset of the inverted lists that make up the index of the collection, and serves them to a central machine as they are required for query evaluation.In this paper we introduce a pipelined query-evaluation methodology, based on a term-partitioned index, in which partially evaluated queries are passed amongst the set of processors that host the query terms. This arrangement retains the disk read benefits of term partitioning, but more effectively shares the computational load. We compare the three methodologies experimentally, and show that term distribution is inefficient and scales poorly. The new pipelined approach offers efficient memory utilization and efficient use of disk accesses, but suffers from problems with load balancing between nodes. Until these problems are resolved, document partitioning remains the preferred method.