Text Retrieval by Using k-word Proximity Search

Authors:
Kunihiko Sadakane;Hiroshi Imai
Affiliations:
-;-
Venue:
DANTE '99 Proceedings of the 1999 International Symposium on Database Applications in Non-Traditional Environments
Year:
1999

Citing 0
Cited 2

Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
On content-driven search-keyword suggesters for literature digital libraries

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As the result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining.In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O (n log n) time where n is the number of occurrences of given keywords. We run the plane-sweep algorithm on a large collection of html files and verify its effectiveness.