Efficient phrase querying with an auxiliary index

Authors:
Dirk Bahle;Hugh E. Williams;Justin Zobel
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 14
Cited 23

Term clustering of syntactic phrases

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Exploring the similarity space

ACM SIGIR Forum
Phrase recognition and expansion for short, precision-biased queries based on a query log

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Results and challenges in Web search evaluation

WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Scalable browsing for large collections: a case study

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Improving browsing in digital libraries with keyphrase indexes

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Interactive Internet search: keyword, directory and query reformulation mechanisms compared

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Optimised phrase querying and browsing of large text databases

ACSC '01 Proceedings of the 24th Australasian conference on Computer science

Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Heavy-tailed distributions and multi-keyword queries

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Locality-Based pruning methods for web search

ACM Transactions on Information Systems (TOIS)
Efficient phrase querying with common phrase index

Information Processing and Management: an International Journal
An intelligent information retrieval agent

Knowledge-Based Systems
TinyLex: static n-gram index pruning with perfect recall

Proceedings of the 17th ACM conference on Information and knowledge management
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Optimization issues in inverted index-based entity annotation

Proceedings of the 3rd international conference on Scalable information systems
Out of the Box Phrase Indexing

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
An intelligent agent for information retrieval

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Static pruning of terms in inverted files

ECIR'07 Proceedings of the 29th European conference on IR research
Index structures for efficiently searching natural language text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Cube index for unstructured text analysis and mining

Proceedings of the 2011 International Conference on Communication, Computing & Security
A scalable real-time search engine for fast retrieval of social media content

Proceedings of the 2nd international workshop on Ubiquitous crowdsouring
Efficient phrase querying with flat position index

Proceedings of the 20th ACM international conference on Information and knowledge management
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Efficient phrase querying with common phrase index

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword indexes, but these indexes are twice as large as conventional inverted files. We propose a combination of nextword indexes with inverted files as a solution to this problem. Our experiments show that combined use of an auxiliary nextword index and a conventional inverted file allow evaluation of phrase queries in half the time required to evaluate such queries with an inverted file alone, and the space overhead is only 10% of the size of the inverted file. Further time savings are available with only slight increases in disk requirements.