Efficient text proximity search

Authors:
Ralf Schenkel;Andreas Broschart;Seungwon Hwang;Martin Theobald;Gerhard Weikum
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;POSTECH, Korea;Stanford University;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Year:
2007

Citing 16
Cited 13

TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
Efficiency/effectiveness trade-offs in query processing (from theory into practice workshop, 1998 SIGIR conf.)

ACM SIGIR Forum
Effective document presentation with a locality-based similarity heuristic

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance ranking for one to three term queries

Information Processing and Management: an International Journal
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Expressiveness and performance of full-text search languages

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient phrase querying with common phrase index

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Effective top-k computation with term-proximity support

Information Processing and Management: an International Journal
Compressing term positions in web indexes

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Index tuning for efficient proximity-enhanced query processing

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
A novel hybrid index structure for efficient text retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Upper-bound approximations for dynamic pruning

ACM Transactions on Information Systems (TOIS)
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Evaluating the potential of explicit phrases for retrieval quality

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Efficient top-k document retrieval using a term-document binary matrix

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
To index or not to index: time-space trade-offs in search engines with positional ranking functions

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.