Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
Recent directions in netlist partitioning: a survey
Integration, the VLSI Journal
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Parallel Generation of Inverted Files for Distributed Text Collections
SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Graphs and Hypergraphs
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Analyzing imbalance among homogeneous index servers in a web search system
Information Processing and Management: an International Journal
A pipelined architecture for distributed text query evaluation
Information Retrieval
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices
Journal of Parallel and Distributed Computing
Mining query logs to optimize index partitioning in parallel web search engines
Proceedings of the 2nd international conference on Scalable information systems
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines
Proceedings of the 18th international conference on World wide web
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Load and storage balanced posting file partitioning for parallel information retrieval
Journal of Systems and Software
A combined semi-pipelined query processing architecture for distributed full-text retrieval
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Replicated partitioning for undirected hypergraphs
Journal of Parallel and Distributed Computing
Intra-query concurrent pipelined processing for distributed full-text retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A Parallel Framework for In-Memory Construction of Term-Partitioned Inverted Indexes
The Computer Journal
Improving the performance of pipelined query processing with skipping
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the index is either document-based or term-based partitioned. This choice is made depending on the properties of the underlying hardware infrastructure, query traffic distribution, and some performance and availability constraints. In query processing on retrieval systems that adopt a term-based index partitioning strategy, the high communication overhead due to the transfer of large amounts of data from the index servers forms a major performance bottleneck, deteriorating the scalability of the entire distributed retrieval system. In this work, to alleviate this problem, we propose a novel inverted index partitioning model that relies on hypergraph partitioning. In the proposed model, concurrently accessed index entries are assigned to the same index servers, based on the inverted index access patterns extracted from the past query logs. The model aims to minimize the communication overhead that will be incurred by future queries while maintaining the computational load balance among the index servers. We evaluate the performance of the proposed model through extensive experiments using a real-life text collection and a search query sample. Our results show that considerable performance gains can be achieved relative to the term-based index partitioning strategies previously proposed in literature. In most cases, however, the performance remains inferior to that attained by document-based partitioning.