Efficient phrase querying with flat position index

Authors:
Dongdong Shan;Wayne Xin Zhao;Jing He;Rui Yan;Hongfei Yan;Xiaoming Li
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 9
Cited 2

Analysis of a very large web search engine query log

ACM SIGIR Forum
Efficient phrase querying with an auxiliary index

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression is good, especially for random access

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
#TwitterSearch: a comparison of microblog search and web search

Proceedings of the fourth ACM international conference on Web search and data mining
Structured index organizations for high-throughput text querying

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

To index or not to index: time-space trade-offs in search engines with positional ranking functions

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Reordering an index to speed query processing without loss of effectiveness

Proceedings of the Seventeenth Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large proportion of search engine queries contain phrases,namely a sequence of adjacent words. In this paper, we propose to use flat position index (a.k.a schema-independent index) for phrase query evaluation. In the flat position index, the entire document collection is viewed as a huge sequence of tokens. Each token is represented by one flat position, which is a unique position offset from the beginning of the collection. Each indexed term is associated with a list of the flat positions about that term in the sequence. To recover DocID from flat positions efficiently, we propose a novel cache sensitive look-up table (CSLT), which is much faster than existing search algorithms. Experiments on TREC GOV2 data collection show that flat position index can reduce the index size and speed up phrase querying substantially, compared with traditional word-level index.