Partitioned posting files: a parallel inverted file structure for information retrieval
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel text searching in serial files using a processor farm
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of a distributed architecture for information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Performance issues in distributed shared-nothing information-retrieval systems
Information Processing and Management: an International Journal
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
ACM Transactions on Information Systems (TOIS)
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
Scalable Text Retrieval for Large Digital Libraries
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Hybrid Partition Inverted Files: Experimental Validation
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Partial Collection Replication for Information Retrieval
Information Retrieval
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Space-Limited ranked query evaluation using adaptive pruning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
A pipelined architecture for distributed text query evaluation
Information Retrieval
Load balancing distributed inverted files
Proceedings of the 9th annual ACM international workshop on Web information and data management
Mining query logs to optimize index partitioning in parallel web search engines
Proceedings of the 2nd international conference on Scalable information systems
Scheduling Intersection Queries in Term Partitioned Inverted Files
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Parallel query processing on distributed clustering indexes
Journal of Discrete Algorithms
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
Improving the load balance for hybrid partitioning scheme by directing hybrid queries
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
Performance comparison of clustered and replicated information retrieval systems
ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Load and storage balanced posting file partitioning for parallel information retrieval
Journal of Systems and Software
A combined semi-pipelined query processing architecture for distributed full-text retrieval
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Automatic management of partitioned, replicated search services
Proceedings of the 2nd ACM Symposium on Cloud Computing
Replicated partitioning for undirected hypergraphs
Journal of Parallel and Distributed Computing
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Scalable search platform: improving pipelined query processing for distributed full-text retrieval
Proceedings of the 21st international conference companion on World Wide Web
An investigation into query throughput and load balance using grid IR
FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access
Intra-query concurrent pipelined processing for distributed full-text retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Load Balancing Query Processing in Metric-Space Similarity Search
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Distributed search based on self-indexed compressed text
Information Processing and Management: an International Journal
Capacity planning for vertical search engines: an approach based on coloured petri nets
PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
3D inverted index with cache sharing for web search engines
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Improving the performance of pipelined query processing with skipping
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Maguro, a system for indexing and searching over very large text collections
Proceedings of the sixth ACM international conference on Web search and data mining
Rank-energy selective query forwarding for distributed search systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A term-based inverted index partitioning model for efficient distributed query processing
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system.