Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Combining fuzzy information from multiple systems (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Interaction of query evaluation and buffer management for information retrieval
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases
ACM Transactions on Information Systems (TOIS)
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ACM Transactions on Internet Technology (TOIT)
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Lessons from Giant-Scale Services
IEEE Internet Computing
Performance of Inverted Indices in Distributed Text Document Retrieval Systems
PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Optimizing result prefetching in web search engines with segmented indices
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
ACM Transactions on Information Systems (TOIS)
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
Proceedings of the 15th international conference on World Wide Web
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Design trade-offs for search engine caching
ACM Transactions on the Web (TWEB)
Can phrase indexing help to process non-phrase queries?
Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Improved techniques for result caching in web search engines
Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Selective-NRA Algorithms for Top-k Queries
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
A Last-Resort Semantic Cache for Web Queries
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Revisiting globally sorted indexes for efficient document retrieval
Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index
Proceedings of the 13th International Conference on Extending Database Technology
A method for processing the natural language query in ontology-based image retrieval system
AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
A hybrid cache and prefetch mechanism for scientific literature search engines
ICWE'07 Proceedings of the 7th international conference on Web engineering
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Admission policies for caches of search engine results
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Flood little, cache more: effective result-reuse in P2P IR systems
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Caching search engine results over incremental indices
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Query forwarding in geographically distributed search engines
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
New caching techniques for web search engines
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Active caching for similarity queries based on shared-neighbor information
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient term proximity search with term-pair indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Cost-Aware Strategies for Query Result Caching in Web Search Engines
ACM Transactions on the Web (TWEB)
On-line multi-threaded processing of web user-clicks on multi-core processors
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Performance evaluation of improved web search algorithms
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Timestamp-based result cache invalidation for web search engines
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently encoding term co-occurrences in inverted indexes
Proceedings of the 20th ACM international conference on Information and knowledge management
High-performance processing of text queries with tunable pruned term and term pair indexes
ACM Transactions on Information Systems (TOIS)
Learning to distribute queries into web search nodes
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Towards a distributed search engine
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
An ontology-based approach of multimedia information personalized search
AMR'10 Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion
Scalable search platform: improving pipelined query processing for distributed full-text retrieval
Proceedings of the 21st international conference companion on World Wide Web
Efficient top-k document retrieval using a term-document binary matrix
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Effective caching of shortest paths for location-based services
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Modeling static caching in web search engines
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Distributed search based on self-indexed compressed text
Information Processing and Management: an International Journal
A five-level static cache architecture for web search engines
Information Processing and Management: an International Journal
Prefetching query results and its impact on search engines
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Online result cache invalidation for real-time web search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cache-Based Query Processing for Search Engines
ACM Transactions on the Web (TWEB)
Document selection for tiered indexing in commerce search
Proceedings of the sixth ACM international conference on Web search and data mining
Generalized scale independence through incremental precomputation
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The impact of solid state drive on search engine cache management
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Permutation indexing: fast approximate retrieval from large corpora
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting query term correlation for list caching in web search engines
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines
ACM Transactions on the Web (TWEB)
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.00 |
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level.We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance.