A fast algorithm for the generalized k-keyword proximity problem given keyword offsets

  • Authors:
  • Sung-Ryul Kim;Inbok Lee;Kunsoo Park

  • Affiliations:
  • WiseNut Inc., Division of Internet & Media and Center for Aerospace System Integration Technology, Konkuk University, Republic of Korea;School of Computer Science & Engineering, Seoul National University, Republic of Korea;School of Computer Science & Engineering, Seoul National University, Republic of Korea

  • Venue:
  • Information Processing Letters
  • Year:
  • 2004

Quantified Score

Hi-index 0.89

Visualization

Abstract

When searching for information on the Web, it is often necessary to use one of the available search engines. Because the number of results are quite large for most queries, we need some measure of relevance with respect to the query. One of the most important relevance factors is the proximity score, i.e., how close the keywords appear together in a given document. A basic proximity score is given by the size of the smallest range containing all the keywords in the query. We generalize the proximity score to include many practically important cases and present an O(n log k)-time algorithm for the generalized problem, where k is the number of keywords and n is the number of occurrences of the keywords in a document.