Efficient phrase querying with flat position index

  • Authors:
  • Dongdong Shan;Wayne Xin Zhao;Jing He;Rui Yan;Hongfei Yan;Xiaoming Li

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large proportion of search engine queries contain phrases,namely a sequence of adjacent words. In this paper, we propose to use flat position index (a.k.a schema-independent index) for phrase query evaluation. In the flat position index, the entire document collection is viewed as a huge sequence of tokens. Each token is represented by one flat position, which is a unique position offset from the beginning of the collection. Each indexed term is associated with a list of the flat positions about that term in the sequence. To recover DocID from flat positions efficiently, we propose a novel cache sensitive look-up table (CSLT), which is much faster than existing search algorithms. Experiments on TREC GOV2 data collection show that flat position index can reduce the index size and speed up phrase querying substantially, compared with traditional word-level index.