Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
Exploiting hierarchical domain structure to compute similarity
ACM Transactions on Information Systems (TOIS)
Taking a new look at the latent semantic analysis approach to information retrieval
Computational information retrieval
Unitary operators on the document space
Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Choosing the word most typical in context using a lexical co-occurrence network
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A probabilistic model for Latent Semantic Indexing: Research Articles
Journal of the American Society for Information Science and Technology
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term norm distribution and its effects on latent semantic indexing
Information Processing and Management: an International Journal
Introduction to Information Retrieval
Introduction to Information Retrieval
Geometric and topological approaches to semantic text retrieval
Geometric and topological approaches to semantic text retrieval
A topology-based semantic location model for indoor applications
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Unified linear subspace approach to semantic analysis
Journal of the American Society for Information Science and Technology
A framework for understanding Latent Semantic Indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Hi-index | 0.00 |
The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin ([1974]) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective. © 2010 Wiley Periodicals, Inc.