Information Processing and Management: an International Journal
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval and filtering: analytic models of performance
Text retrieval and filtering: analytic models of performance
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
A survey on the use of relevance feedback for information access systems
The Knowledge Engineering Review
Eigenvalue-based model selection during latent semantic indexing: Research Articles
Journal of the American Society for Information Science and Technology
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A framework for understanding latent semantic indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
High scent web page recommendations using fuzzy rough set attribute reduction
Transactions on rough sets XIV
A Survey of Automatic Query Expansion in Information Retrieval
ACM Computing Surveys (CSUR)
Automatic shape expansion with verification to improve 3D retrieval, classification and matching
3DOR '13 Proceedings of the Sixth Eurographics Workshop on 3D Object Retrieval
Hi-index | 0.00 |
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.