Variable latent semantic indexing

Authors:
Anirban Dasgupta;Ravi Kumar;Prabhakar Raghavan;Andrew Tomkins
Affiliations:
Cornell University, Ithaca, NY;IBM Almaden Research Center, San Jose, CA;Yahoo!, Research Labs, Sunnyvale, CA;IBM Almaden Research Center, San Jose, CA
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 11
Cited 0

Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Mixtures of probabilistic principal component analyzers

Neural Computation
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Kernel PCA and de-noising in feature spaces

Proceedings of the 1998 conference on Advances in neural information processing systems II
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
SVDPACKC (Version 1.0) User''s Guide

SVDPACKC (Version 1.0) User''s Guide
Latent semantic models for collaborative filtering

ACM Transactions on Information Systems (TOIS)
Web mining in search engines

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Generalized low rank approximations of matrices

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficiency-quality tradeoffs for vector score aggregation

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions.