Matrix analysis
The vocabulary problem in human-system communication
Communications of the ACM
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Probabilistic models in information retrieval
The Computer Journal - Special issue on information retrieval
Application of loglinear models to informetric phenomena
Information Processing and Management: an International Journal - Special issue on Informetrics
A caching relay for the World Wide Web
Selected papers of the first conference on World-Wide Web
Representing documents using an explicit model of their similarities
Journal of the American Society for Information Science
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Matrix computations (3rd ed.)
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the ninth international conference on Information and knowledge management
Concept decompositions for large sparse text data using clustering
Machine Learning
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Low-Rank Approximations with Sparse Factors I: Basic Algorithms and Error Analysis
SIAM Journal on Matrix Analysis and Applications
Toward a Qualitative Search Engine
IEEE Internet Computing
Approximate Dimension Equalization in Vector-based Information Retrieval
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Large-Scale SVD and Subspace-Based Methods for Information Retrieval
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
A probabilistic model for latent semantic indexing in information retrieval and filtering
Computational information retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Investigating the relationship between language model perplexity and IR precision-recall measures
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
K-means clustering via principal component analysis
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Term norm distribution and its effects on latent semantic indexing
Information Processing and Management: an International Journal
Higher-Order Web Link Analysis Using Multilinear Algebra
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Latent semantic analysis for multiple-type interrelated data objects
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Visual analytics: Storylines: Visual exploration and analysis in latent semantic spaces
Computers and Graphics
Two uses of anaphora resolution in summarization
Information Processing and Management: an International Journal
A basis for information retrieval in context
ACM Transactions on Information Systems (TOIS)
An analysis of latent semantic term self-correlation
ACM Transactions on Information Systems (TOIS)
The VLDB Journal — The International Journal on Very Large Data Bases
Update summarization based on novel topic distribution
Proceedings of the 9th ACM symposium on Document engineering
Update Summarization Based on Latent Semantic Analysis
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Kernel latent semantic analysis using an information retrieval based kernel
Proceedings of the 18th ACM conference on Information and knowledge management
Unified linear subspace approach to semantic analysis
Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis
Journal of the American Society for Information Science and Technology
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments on five standard document collections confirm and illustrate the analysis. © 2005 Wiley Periodicals, Inc.