Strategies for building distributed information retrieval systems
Information Processing and Management: an International Journal
Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Information Systems (TOIS)
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Methodologies for Distributed Information Retrieval
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
ACM Transactions on Information Systems (TOIS)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A pipelined architecture for distributed text query evaluation
Information Retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Scheduling Intersection Queries in Term Partitioned Inverted Files
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Two-Dimensional Distributed Inverted Files
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Load and storage balanced posting file partitioning for parallel information retrieval
Journal of Systems and Software
A combined semi-pipelined query processing architecture for distributed full-text retrieval
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Assigning documents to master sites in distributed search
Proceedings of the 20th ACM international conference on Information and knowledge management
Replicated partitioning for undirected hypergraphs
Journal of Parallel and Distributed Computing
Scalable search platform: improving pipelined query processing for distributed full-text retrieval
Proceedings of the 21st international conference companion on World Wide Web
Intra-query concurrent pipelined processing for distributed full-text retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Distributed search based on self-indexed compressed text
Information Processing and Management: an International Journal
Improving the performance of pipelined query processing with skipping
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A term-based inverted index partitioning model for efficient distributed query processing
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested in devising an effective term-partitioning strategy, according to which the global vocabulary of terms and the associated inverted lists are split into disjoint subsets, and assigned to distinct servers. Due to the workload imbalance caused by the skewed distribution of terms in user queries, finding an effective partitioning strategy is considered a very complex task. In this paper we first formally introduce Term Partitioning as a new optimization problem. Then we show how the knowledge mined from past WSE query logs can be profitably used to discover good solutions of this problem. Finally, we report many results to show that we are able to effectively reduce both the average number of servers activated per each query, along with the workload imbalance. Experiments are conducted on large query logs of real WSEs.