Out of the Box Phrase Indexing

Authors:
Frederik Transier;Peter Sanders
Affiliations:
SAP NetWeaver EIM TREX, SAP AG, Walldorf, Germany and University of Karlsruhe, Karlsruhe, Germany;University of Karlsruhe, Karlsruhe, Germany
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 10
Cited 2

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Improving browsing in digital libraries with keyphrase indexes

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Optimised phrase querying and browsing of large text databases

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Inverted files versus suffix arrays for locating patterns in primary memory

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Efficient phrase querying with common phrase index

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Indexing Word Sequences for Ranked Retrieval

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for optimizing phrase search based on inverted indexes. Our approach adds selected (two-term) phrases to an existing index. Whereas competing approaches are often based on the analysis of query logs, our approach works out of the box and uses only the information contained in the index. Also, our method is competitive in terms of query performance and can even improve on other approaches for difficult queries. Moreover, our approach gives performance guarantees for arbitrary queries. Further, we propose using a phrase index as a substitute for the positional index of an in-memory search engine working with short documents. We support our conclusions with experiments using a high-performance main-memory search engine. We also give evidence that classical disk based systems can profit from our approach.