Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
ACM Transactions on Mathematical Software (TOMS)
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic indexing model for Boolean query formulation (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Approximate Dimension Equalization in Vector-based Information Retrieval
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A class of multistep sparse matrix strategies for concept decomposition matrix approximation
Proceedings of the 2009 ACM symposium on Applied Computing
Implementation techniques for large-scale latent semantic indexing applications
Proceedings of the 20th ACM international conference on Information and knowledge management
Understanding and enhancing the folding-in method in latent semantic indexing
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Abstract: Latent Semantic Indexing (LSI), a vector space- based approach to information retrieval, has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, if any, presents implementation issues in practice. In this paper, we describe a production-level implementation of LSI. The system integrates components including document collection and preprocessing, singular value decomposition (SVD), multilingual processing, and a tree-based access method for similarity querying. We discuss implementation issues encountered during the development of the system. In particular, we address scalability issues in the query engine and various components of the system, and present lessons learned.