A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Eigenvalue-based model selection during latent semantic indexing: Research Articles
Journal of the American Society for Information Science and Technology
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive label-driven scaling for latent semantic indexing
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
This poster introduces a novel approach to information retrieval that uses statistical model averaging to improve latent semantic indexing (LSI). Instead of choosing a single dimensionality $k$ for LSI , we propose using several models of differing dimensionality to inform retrieval. To manage this ensemble we weight each model's contribution to an extent inversely proportional to its AIC (Akaike information criterion). Thus each model contributes proportionally to its expected Kullback-Leibler divergence from the distribution that generated the data. We present results on three standard IR test collections, demonstrating significant improvement over both the traditional vector space model and single-model LSI.