Eigenvalue-based model selection during latent semantic indexing: Research Articles

Authors:
Miles Efron
Affiliations:
School of Information, The University of Texas, Austin, 1 University Station D7000, Austin, TX 78712-0390
Venue:
Journal of the American Society for Information Science and Technology
Year:
2005

Citing 12
Cited 5

Using linear algebra for intelligent information retrieval

SIAM Review
Applied multivariate techniques

Applied multivariate techniques
Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval and filtering: analytic models of performance

Text retrieval and filtering: analytic models of performance
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Modern Information Retrieval

Modern Information Retrieval
Approximate Dimension Equalization in Vector-based Information Retrieval

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Eigenvalue Power Law

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval

Eigenvalue-based estimators for optimal dimensionality reduction in information retrieval

Model-averaged latent semantic indexing

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion and dimensionality reduction: Notions of optimality in Rocchio relevance feedback and latent semantic indexing

Information Processing and Management: an International Journal
An empirical study of required dimensionality for large-scale latent semantic indexing applications

Proceedings of the 17th ACM conference on Information and knowledge management
An analysis of latent semantic term self-correlation

ACM Transactions on Information Systems (TOIS)
Kernel latent semantic analysis using an information retrieval based kernel

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these “null” eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality. © 2005 Wiley Periodicals, Inc.