Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Computational Methods for Intelligent Information Access
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Proceedings of the Tenth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Probabilistic Spell for the Curse of Dimensionality
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Approximate similarity retrieval with M-trees
The VLDB Journal — The International Journal on Very Large Data Bases
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
D-Index: Distance Searching Index for Metric Data Sets
Multimedia Tools and Applications
Hi-index | 0.00 |
Text collections represented in LSI model are hard to search efficiently (i.e. quickly), since there exists no indexing method for the LSI matrices. The inverted file, often used in both boolean and classic vector model, cannot be effectively utilized, because query vectors in LSI model are dense. A possible way for efficient search in LSI matrices could be the usage of metric access methods (MAMs). Instead of cosine measure, the MAMs can utilize the deviation metric for query processing as an equivalent dissimilarity measure. However, the intrinsic dimensionality of collections represented by LSI matrices is often large, which decreases MAMs' performance in searching. In this paper we introduce σ-LSI, a modification of LSI in which we artificially decrease the intrinsic dimensionality of LSI matrices. This is achieved by an adjustment of singular values produced by SVD. We show that suitable adjustments could dramatically improve the efficiency when searching by MAMs, while the precision/recall values remain preserved or get only slightly worse.