Latent semantic indexing is an optimal special case of multidimensional scaling
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automating the assignment of submitted manuscripts to reviewers
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
Information Processing and Management: an International Journal
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Large-Scale SVD and Subspace-Based Methods for Information Retrieval
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Discourse Segmentation in Aid of Document Summarization
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 3 - Volume 3
The document representation problem: an analysis of lsi and iterative residual rescaling
The document representation problem: an analysis of lsi and iterative residual rescaling
Multilingual Document Clustering, Topic Extraction and Data Transformations
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Locality preserving indexing for document representation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Visualization-enabled multi-document summarization by Iterative Residual Rescaling
Natural Language Engineering
A probabilistic model for Latent Semantic Indexing: Research Articles
Journal of the American Society for Information Science and Technology
Orthogonal locality preserving indexing
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term norm distribution and its effects on latent semantic indexing
Information Processing and Management: an International Journal
Automatically classifying emails into activities
Proceedings of the 11th international conference on Intelligent user interfaces
A Unified View on Clustering Binary Data
Machine Learning
Principles of hash-based text retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The uncovering of hidden structures by Latent Semantic Analysis
Information Sciences: an International Journal
Augmenting the power of LSI in text retrieval: Singular value rescaling
Data & Knowledge Engineering
Clustering based on matrix approximation: a unifying view
Knowledge and Information Systems
Activity-centric email: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Update summarization based on novel topic distribution
Proceedings of the 9th ACM symposium on Document engineering
Incremental aspect models for mining document streams
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Evaluation of two systems on multi-class multi-label document classification
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Partial-update dimensionality reduction for accumulating co-occurrence events
Pattern Recognition Letters
Hi-index | 0.00 |
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novelsubspace-basedframework for formalizing this task. Using this framework, we derive a new analysis ofLatent Semantic Indexing(LSI), showing a precise relationship between its performance and theuniformityof the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000)Iterative Residual Rescaling(\ours) algorithm: \ours\ can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor \ours\ depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.