Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Operational requirements for scalable search systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: an International Journal
Basic issues on the processing of web queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling performance-driven workload characterization of web search systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Collection selection: ...now, with more documents!
Proceedings of the 3rd international conference on Scalable information systems
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
A comparative analysis of similarity measurement techniques through SimReq framework
Proceedings of the 7th International Conference on Frontiers of Information Technology
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
A term-based inverted index partitioning model for efficient distributed query processing
ACM Transactions on the Web (TWEB)
Using rating matrix compression techniques to speed up collaborative recommendations
Information Retrieval
Hi-index | 0.00 |
The performance of parallel query processing in a cluster of index servers is crucial for modern web search systems. In such a scenario, the response time basically depends on the execution time of the slowest server to generate a partial ranked answer. Previous approaches investigate performance issues in this context using simulation, analytical modeling, experimentation, or a combination of them. Nevertheless, these approaches simply assume balanced execution times among homogeneous servers (by uniformly distributing the document collection among them, for instance)-a scenario that we did not observe in our experimentation. On the contrary, we found that even with a balanced distribution of the document collection among index servers, correlations between the frequency of a term in the query log and the size of its corresponding inverted list lead to imbalances in query execution times at these same servers, because these correlations affect disk caching behavior. Further, the relative sizes of the main memory at each server (with regard to disk space usage) and the number of servers participating in the parallel query processing also affect imbalance of local query execution times. These are relevant findings that have not been reported before and that, we understand, are of interest to the research community.