Why spectral retrieval works

Authors:
Holger Bast;Debapriyo Majumdar
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 13
Cited 14

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A similarity-based probability model for latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Spectral analysis of data

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Iterative residual rescaling

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the use of the singular value decomposition for text retrieval

Computational information retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval

Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval
On scaling latent semantic indexing for large peer-to-peer systems

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Latent semantic analysis for multiple-type interrelated data objects

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Model-averaged latent semantic indexing

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion and dimensionality reduction: Notions of optimality in Rocchio relevance feedback and latent semantic indexing

Information Processing and Management: an International Journal
Shine: search heterogeneous interrelated entities

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient interactive query expansion with complete search

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An empirical study of required dimensionality for large-scale latent semantic indexing applications

Proceedings of the 17th ACM conference on Information and knowledge management
Distributed, large-scale latent semantic analysis by index interpolation

Proceedings of the 3rd international conference on Scalable information systems
Efficient storage and retrieval of probabilistic latent semantic information for information retrieval

The VLDB Journal — The International Journal on Very Large Data Bases
Unified linear subspace approach to semantic analysis

Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis

Journal of the American Society for Information Science and Technology
Relatedness curves for acquiring paraphrases

GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Temporal Link Prediction Using Matrix and Tensor Factorizations

ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering a term taxonomy from term similarities using principal component analysis

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Survey: Some results of Christos Papadimitriou on internet structure, network routing, and web information

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resultin curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the term-pair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme.