A probabilistic model for Latent Semantic Indexing: Research Articles

Authors:
Chris H. Q. Ding
Affiliations:
Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720
Venue:
Journal of the American Society for Information Science and Technology
Year:
2005

Citing 36
Cited 13

Matrix analysis

Matrix analysis
The vocabulary problem in human-system communication

Communications of the ACM
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Application of loglinear models to informetric phenomena

Information Processing and Management: an International Journal - Special issue on Informetrics
A caching relay for the World Wide Web

Selected papers of the first conference on World-Wide Web
Representing documents using an explicit model of their similarities

Journal of the American Society for Information Science
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model

Information Processing and Management: an International Journal
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval

Proceedings of the ninth international conference on Information and knowledge management
Concept decompositions for large sparse text data using clustering

Machine Learning
Spectral analysis of data

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Iterative residual rescaling

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Low-Rank Approximations with Sparse Factors I: Basic Algorithms and Error Analysis

SIAM Journal on Matrix Analysis and Applications
Toward a Qualitative Search Engine

IEEE Internet Computing
Approximate Dimension Equalization in Vector-based Information Retrieval

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Large-Scale SVD and Subspace-Based Methods for Information Retrieval

IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
A probabilistic model for latent semantic indexing in information retrieval and filtering

Computational information retrieval
Latent concepts and the number orthogonal factors in latent semantic analysis

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Investigating the relationship between language model perplexity and IR precision-recall measures

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Term norm distribution and its effects on latent semantic indexing

Information Processing and Management: an International Journal

Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Latent semantic analysis for multiple-type interrelated data objects

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Visual analytics: Storylines: Visual exploration and analysis in latent semantic spaces

Computers and Graphics
Two uses of anaphora resolution in summarization

Information Processing and Management: an International Journal
A basis for information retrieval in context

ACM Transactions on Information Systems (TOIS)
An analysis of latent semantic term self-correlation

ACM Transactions on Information Systems (TOIS)
Efficient storage and retrieval of probabilistic latent semantic information for information retrieval

The VLDB Journal — The International Journal on Very Large Data Bases
Update summarization based on novel topic distribution

Proceedings of the 9th ACM symposium on Document engineering
Update Summarization Based on Latent Semantic Analysis

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Kernel latent semantic analysis using an information retrieval based kernel

Proceedings of the 18th ACM conference on Information and knowledge management
Unified linear subspace approach to semantic analysis

Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis

Journal of the American Society for Information Science and Technology
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments on five standard document collections confirm and illustrate the analysis. © 2005 Wiley Periodicals, Inc.