The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Effective document presentation with a locality-based similarity heuristic
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Fast phrase querying with combined indexes
ACM Transactions on Information Systems (TOIS)
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing time vs. query time: trade-offs in dynamic information retrieval systems
Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
IO-Top-k: index-access optimized top-k query processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A document-centric approach to static index pruning in text retrieval systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Heavy-tailed distributions and multi-keyword queries
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Top-k query evaluation with probabilistic guarantees
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective top-k computation in retrieving structured documents with term-proximity support
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Introduction to Information Retrieval
Introduction to Information Retrieval
Can phrase indexing help to process non-phrase queries?
Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Best-Effort Top-k Query Processing Under Budgetary Constraints
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A proximity language model for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The Probabilistic Relevance Framework: BM25 and Beyond
Foundations and Trends in Information Retrieval
Term proximity scoring for keyword-based retrieval systems
ECIR'03 Proceedings of the 25th European conference on IR research
Efficient text proximity search
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Viewing term proximity from a different perspective
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Efficient term proximity search with term-pair indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Boosting web retrieval through query operations
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Efficient phrase querying with common phrase index
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Proceedings of the 18th Australasian Document Computing Symposium
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.00 |
Term proximity scoring is an established means in information retrieval for improving result quality of full-text queries. Integrating such proximity scores into efficient query processing, however, has not been equally well studied. Existing methods make use of precomputed lists of documents where tuples of terms, usually pairs, occur together, usually incurring a huge index size compared to term-only indexes. This article introduces a joint framework for trading off index size and result quality, and provides optimization techniques for tuning precomputed indexes towards either maximal result quality or maximal query processing performance under controlled result quality, given an upper bound for the index size. The framework allows to selectively materialize lists for pairs based on a query log to further reduce index size. Extensive experiments with two large text collections demonstrate runtime improvements of more than one order of magnitude over existing text-based processing techniques with reasonable index sizes.