Analysis of Effectiveness of Retrieval in Clustered Files
Journal of the ACM (JACM)
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Some approaches to best-match file searching
Communications of the ACM
A tree algorithm for nearest neighbor searching in document retrieval systems.
A tree algorithm for nearest neighbor searching in document retrieval systems.
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
Dynamic information and library processing
Dynamic information and library processing
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
New techniques for best-match retrieval
ACM Transactions on Information Systems (TOIS)
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The nearest neighbour problem in information retrieval: an algorithm using upperbounds
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
A probabilistic algorithm for nearest neighbour searching
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Hi-index | 0.00 |
The problem of finding nearest neighbors to a query in a document collection is a special case of associative retrieval, in which searches are performed using more than one key. A nearest neighbors associative retrieval algorithm, suitable for document retrieval using similarity matching, is described. The basic structure used is a binary tree, at each node a set of keys (concepts) is tested to select the most promising branch. Backtracking to initially rejected branches is allowed and often necessary. Under certain conditions, the search time required by this algorithm is 0(log2N)k. N is the number of documents, and k is a system-dependent parameter. A series of experiments with a small collection confirm the predictions made using the analytic model; k is approximately 4 in this situation. This algorithm is compared with two other searching algorithms; sequential search and clustered search. For large collections, the average search time for this algorithm is less than that for a sequential search and greater than that for a clustered search. However, the clustered search, unlike the sequential search and this algorithm, does not guarantee that the near neighbors found are actually the nearest neighbors.