Effective top-k computation with term-proximity support

  • Authors:
  • Mingjie Zhu;Shuming Shi;Mingjing Li;Ji-Rong Wen

  • Affiliations:
  • University of Science and Technology of China, Hefei, 230026, China;Microsoft Research Asia, Beijing 100080, China;University of Science and Technology of China, Hefei, 230026, China;Microsoft Research Asia, Beijing 100080, China

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially important factors in ranking functions, such as term-proximity (the distance relationship between query terms in a document). In our recent work [Zhu, M., Shi, S., Li, M., & Wen, J. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of 16th CIKM conference (pp. 771-780)], we demonstrated that, when term-proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. To solve this problem, we built the inverted index based on web page structure and proposed the query processing strategies accordingly. The experimental results indicate that the proposed index structures and query processing strategies significantly improve the top-k efficiency. In this paper, we study the possibility of adopting additional techniques to further improve top-k computation efficiency. We propose a Proximity-Probe Heuristic to make our top-k algorithms more efficient. We also test the efficiency of our approaches on various settings (linear or non-linear ranking functions, exact or approximate top-k processing, etc.).