OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the use of the singular value decomposition for text retrieval
Computational information retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval
Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval
On scaling latent semantic indexing for large peer-to-peer systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Latent semantic analysis for multiple-type interrelated data objects
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Model-averaged latent semantic indexing
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Shine: search heterogeneous interrelated entities
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient interactive query expansion with complete search
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An empirical study of required dimensionality for large-scale latent semantic indexing applications
Proceedings of the 17th ACM conference on Information and knowledge management
Distributed, large-scale latent semantic analysis by index interpolation
Proceedings of the 3rd international conference on Scalable information systems
The VLDB Journal — The International Journal on Very Large Data Bases
Unified linear subspace approach to semantic analysis
Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis
Journal of the American Society for Information Science and Technology
Relatedness curves for acquiring paraphrases
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Temporal Link Prediction Using Matrix and Tensor Factorizations
ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering a term taxonomy from term similarities using principal component analysis
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Hi-index | 0.00 |
We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resultin curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the term-pair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme.