A fast algorithm for the generalized k-keyword proximity problem given keyword offsets

Authors:
Sung-Ryul Kim;Inbok Lee;Kunsoo Park
Affiliations:
WiseNut Inc., Division of Internet & Media and Center for Aerospace System Integration Technology, Konkuk University, Republic of Korea;School of Computer Science & Engineering, Seoul National University, Republic of Korea;School of Computer Science & Engineering, Seoul National University, Republic of Korea
Venue:
Information Processing Letters
Year:
2004

Citing 6
Cited 6

An algorithm for string matching with a sequence of don't cares

Information Processing Letters
New indices for text: PAT Trees and PAT arrays

Information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The ADT Proximity and Text Proximity Problems

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Searching the Web: general and scientific information access

IEEE Communications Magazine

A Simple and Compact Algorithm for the RMQ and Its Application to the Longest Common Repeat Problem

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
A Simple Algorithm for Finding Exact Common Repeats

IEICE - Transactions on Information and Systems
Implementing and evaluating phrasal query suggestions for proximity search

Information Systems
Implementing and evaluating phrasal query suggestions for proximity search

Information Systems
An algorithm for the generalized k-keyword proximity problem and finding longest repetitive substring in a set of strings

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Linear time algorithm for the generalised longest common repeat problem

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.89

Visualization

Abstract

When searching for information on the Web, it is often necessary to use one of the available search engines. Because the number of results are quite large for most queries, we need some measure of relevance with respect to the query. One of the most important relevance factors is the proximity score, i.e., how close the keywords appear together in a given document. A basic proximity score is given by the size of the smallest range containing all the keywords in the query. We generalize the proximity score to include many practically important cases and present an O(n log k)-time algorithm for the generalized problem, where k is the number of keywords and n is the number of occurrences of the keywords in a document.