A tree algorithm for nearest neighbor searching in document retrieval systems

Authors:
Caroline M. Eastman;Stephen F. Weiss
Affiliations:
-;-
Venue:
SIGIR '78 Proceedings of the 1st annual international ACM SIGIR conference on Information storage and retrieval
Year:
1978

Citing 7
Cited 4

Analysis of Effectiveness of Retrieval in Clustered Files

Journal of the ACM (JACM)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Some approaches to best-match file searching

Communications of the ACM
A tree algorithm for nearest neighbor searching in document retrieval systems.

A tree algorithm for nearest neighbor searching in document retrieval systems.
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.
Dynamic information and library processing

Dynamic information and library processing
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

New techniques for best-match retrieval

ACM Transactions on Information Systems (TOIS)
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The nearest neighbour problem in information retrieval: an algorithm using upperbounds

SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
A probabilistic algorithm for nearest neighbour searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of finding nearest neighbors to a query in a document collection is a special case of associative retrieval, in which searches are performed using more than one key. A nearest neighbors associative retrieval algorithm, suitable for document retrieval using similarity matching, is described. The basic structure used is a binary tree, at each node a set of keys (concepts) is tested to select the most promising branch. Backtracking to initially rejected branches is allowed and often necessary. Under certain conditions, the search time required by this algorithm is 0(log2N)k. N is the number of documents, and k is a system-dependent parameter. A series of experiments with a small collection confirm the predictions made using the analytic model; k is approximately 4 in this situation. This algorithm is compared with two other searching algorithms; sequential search and clustered search. For large collections, the average search time for this algorithm is less than that for a sequential search and greater than that for a clustered search. However, the clustered search, unlike the sequential search and this algorithm, does not guarantee that the near neighbors found are actually the nearest neighbors.