Optimizing queries over multimedia repositories
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
ACM Transactions on Information Systems (TOIS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
Modern Information Retrieval
Combining fuzzy information: an overview
ACM SIGMOD Record
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Algorithms
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Towards Efficient Multi-Feature Queries in Heterogeneous Environments
ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Optimizing result prefetching in web search engines with segmented indices
ACM Transactions on Internet Technology (TOIT)
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Pruning strategies for mixed-mode querying
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Analyzing the impact of churn and malicious behavior on the quality of peer-to-peer web search
Proceedings of the 2008 ACM symposium on Applied computing
Query-based partitioning of documents and indexes for information lifecycle management
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Design trade-offs for search engine caching
ACM Transactions on the Web (TWEB)
Can phrase indexing help to process non-phrase queries?
Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
A Study of the Impact of Index Updates on Distributed Query Processing for Web Search
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Effective top-k computation with term-proximity support
Information Processing and Management: an International Journal
Efficiency trade-offs in two-tier web search systems
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting query views for static index pruning in web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic static pruning of inverted files
ACM Transactions on Information Systems (TOIS)
Efficient processing of exact top-k queries over disk-resident sorted lists
The VLDB Journal — The International Journal on Very Large Data Bases
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Ranking under temporal constraints
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Cost-Aware Strategies for Query Result Caching in Web Search Engines
ACM Transactions on the Web (TWEB)
Allocating inverted index into flash memory for search engines
Proceedings of the 20th international conference companion on World wide web
A cascade ranking model for efficient ranked retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
ACM Transactions on Information Systems (TOIS)
XML retrieval using pruned element-index files
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Index ordering by query-independent measures
Information Processing and Management: an International Journal
Cache-Based Query Processing for Search Engines
ACM Transactions on the Web (TWEB)
Document replication strategies for geographically distributed web search engines
Information Processing and Management: an International Journal
Document selection for tiered indexing in commerce search
Proceedings of the sixth ACM international conference on Web search and data mining
Fast document-at-a-time query processing using two-tier indexes
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A Fast Static Index Pruning Algorithm
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.00 |
The Web search engines maintain large-scale inverted indexes which are queried thousands of times per second by users eager for information. In order to cope with the vast amounts of query loads, search engines prune their index to keep documents that are likely to be returned as top results, and use this pruned index to compute the first batches of results. While this approach can improve performance by reducing the size of the index, if we compute the top results only from the pruned index we may notice a significant degradation in the result quality: if a document should be in the top results but was not included in the pruned index, it will be placed behind the results computed from the pruned index. Given the fierce competition in the online search market, this phenomenon is clearly undesirable. In this paper, we study how we can avoid any degradation of result quality due to the pruning-based performance optimization, while still realizing most of its benefit. Our contribution is a number of modifications in the pruning techniques for creating the pruned index and a new result computation algorithm that guarantees that the top-matching pages are always placed at the top search results, even though we are computing the first batch from the pruned index most of the time. We also show how to determine the optimal size of a pruned index and we experimentally evaluate our algorithms on a collection of 130 million Web pages.