An algorithm for string matching with a sequence of don't cares
Information Processing Letters
New indices for text: PAT Trees and PAT arrays
Information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The ADT Proximity and Text Proximity Problems
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Searching the Web: general and scientific information access
IEEE Communications Magazine
A Simple and Compact Algorithm for the RMQ and Its Application to the Longest Common Repeat Problem
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
A Simple Algorithm for Finding Exact Common Repeats
IEICE - Transactions on Information and Systems
Implementing and evaluating phrasal query suggestions for proximity search
Information Systems
Implementing and evaluating phrasal query suggestions for proximity search
Information Systems
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Linear time algorithm for the generalised longest common repeat problem
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Hi-index | 0.89 |
When searching for information on the Web, it is often necessary to use one of the available search engines. Because the number of results are quite large for most queries, we need some measure of relevance with respect to the query. One of the most important relevance factors is the proximity score, i.e., how close the keywords appear together in a given document. A basic proximity score is given by the size of the smallest range containing all the keywords in the query. We generalize the proximity score to include many practically important cases and present an O(n log k)-time algorithm for the generalized problem, where k is the number of keywords and n is the number of occurrences of the keywords in a document.