Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement

Authors:
M. K. Hughey;M. W. Berry
Affiliations:
Department of Computer Science, University of Tennessee, Knoxville, TN, 37996-1301, USA;Department of Computer Science, University of Tennessee, Knoxville, TN, 37996-1301, USA
Venue:
Information Retrieval
Year:
2000

Citing 9
Cited 4

Retrieval techniques

Annual review of information science and technology, vol. 22
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Using linear algebra for intelligent information retrieval

SIAM Review
Nearest neighbor searching and applications

Nearest neighbor searching and applications
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Large-scale information retrieval with latent semantic indexing

Information Sciences: an International Journal
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching

Communications of the ACM
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement

Information Retrieval

Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement

Information Retrieval
Matchbox: large scale online bayesian recommendations

Proceedings of the 18th international conference on World wide web
An FPGA acceleration for the kd-tree search in photon mapping

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Full Length Article: Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient information searching and retrieval methods are needed to navigate the ever increasing volumes of digital information. Traditional lexical information retrieval methods can be inefficient and often return inaccurate results. To overcome problems such as polysemy and synonymy, concept-based retrieval methods have been developed. One such method is Latent Semantic Indexing (LSI), a vector-space model, which uses the singular value decomposition (SVD) of a term-by-document matrix to represent terms and documents in k-dimensional space. As with other vector-space models, LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query matching method requires that the similarity measure be computed between the query and every term and document in the vector space. In this paper, the kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. The kd-tree data structure stores the term and document vectors in such a way that only those terms and documents that are most likely to qualify as nearest neighbors to the query will be examined and retrieved.