Can phrase indexing help to process non-phrase queries?

Authors:
Mingjie Zhu;Shuming Shi;Nenghai Yu;Ji-Rong Wen
Affiliations:
University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 20
Cited 8

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information: an overview

ACM SIGMOD Record
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Simplified similarity scoring using term ranks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning strategies for mixed-mode querying

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective top-k computation in retrieving structured documents with term-proximity support

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research

Granular Computing for Text Mining: New Research Challenges and Opportunities

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Upper-bound approximations for dynamic pruning

ACM Transactions on Information Systems (TOIS)
Query efficiency prediction for dynamic pruning

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Optimized top-k processing with global page scores on block-max indexes

Proceedings of the fifth ACM international conference on Web search and data mining
Entity linking at the tail: sparse signals, unknown entities, and phrase models

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern web search engines, while indexing billions of web pages, are expected to process queries and return results in a very short time. Many approaches have been proposed for efficiently computing top-k query results, but most of them ignore one key factor in the ranking functions of commercial search engines - term-proximity, which is the metric of the distance between query terms in a document. When term-proximity is included in ranking functions, most of the existing top-k algorithms will become inefficient. To address this problem, in this paper we propose to build a compact phrase index to speed up the search process when incorporating the term-proximity factor. The compact phrase index can help more accurately estimate the score upper bounds of unknown documents. The size of the phrase index is controlled by including a small portion of phrases which are possibly helpful for improving search performance. Phrase index has been used to process phrase queries in existing work. It is, however, to the best of our knowledge, the first time that phrase index is used to improve the performance of generic queries. Experimental results show that, compared with the state-of-the-art top-k computation approaches, our approach can reduce average query processing time to 1/5 for typical setttings.