Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The algebraic eigenvalue problem
The algebraic eigenvalue problem
Information retrieval
Information retrieval
Matrix computations (3rd ed.)
Limited-memory matrix methods with applications
Limited-memory matrix methods with applications
Approximating matrix multiplication for pattern recognition tasks
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
SVDPACKC (Version 1.0) User''s Guide
SVDPACKC (Version 1.0) User''s Guide
Information Management Tools for Updating an SVD-Encoded Indexing Scheme
Information Management Tools for Updating an SVD-Encoded Indexing Scheme
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Algorithm 805: computation and uses of the semidiscrete matrix decomposition
ACM Transactions on Mathematical Software (TOMS)
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Lower dimensional representation of text data in vector space based information retrieval
Computational information retrieval
Information retrieval and classification with subspace representations
Computational information retrieval
Symbolic preprocessing techniques for information retrieval using vector space models
Computational information retrieval
Experiments with LSA scoring: optimal rank and basis
Computational information retrieval
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Locality preserving indexing for document representation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Polynomial filtering in latent semantic indexing for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
On scaling latent semantic indexing for large peer-to-peer systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic multimedia cross-modal correlation discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic model for Latent Semantic Indexing: Research Articles
Journal of the American Society for Information Science and Technology
Exploiting concept clusters for content-based information retrieval
Information Sciences—Informatics and Computer Science: An International Journal
QCS: a tool for querying, clustering, and summarizing documents
NAACL-Demonstrations '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4
Enhancing semantic digital library query using a content and service inference model (CSIM)
Information Processing and Management: an International Journal
Structure in the Enron Email Dataset
Computational & Mathematical Organization Theory
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis
ACM Transactions on Mathematical Software (TOMS)
Random projection and orthonormality for lossy image compression
Image and Vision Computing
A comparison of generalized linear discriminant analysis algorithms
Pattern Recognition
Dynamic semantic retrieval space reconstruction for WWW environments
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Ranking of field association terms using Co-word analysis
Information Processing and Management: an International Journal
Augmenting the power of LSI in text retrieval: Singular value rescaling
Data & Knowledge Engineering
Analyzing the efficacy of using digital ink devices in a learning environment
Multimedia Tools and Applications
Categorization of web pages - Performance enhancement to search engine
Knowledge-Based Systems
Mining discrete patterns via binary matrix factorization
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential latent semantic indexing
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Analyzing Social Networks Using FCA: Complexity Aspects
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Privacy-preserving similarity-based text retrieval
ACM Transactions on Internet Technology (TOIT)
Unified linear subspace approach to semantic analysis
Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis
Journal of the American Society for Information Science and Technology
Hypergraph-based multilevel matrix approximation for text information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A comparative study of TF*IDF, LSI and multi-words for text classification
Expert Systems with Applications: An International Journal
A Matrix Computation View of FastMap and RobustMap Dimension Reduction Algorithms
SIAM Journal on Matrix Analysis and Applications
A New and Fast Orthogonal Linear Discriminant Analysis on Undersampled Problems
SIAM Journal on Scientific Computing
Text summarization using Latent Semantic Analysis
Journal of Information Science
Algorithmic and complexity issues of three clustering methods in microarray data analysis
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Augmenting the power of the various versions of LSI used in document retrieval
DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems
Streaming data reduction using low-memory factored representations
Information Sciences: an International Journal
Cyberspace community analysis and simulation using complex dynamic social networks
WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Concepts and architectures for next-generation information search engines
International Journal of Information Management: The Journal for Information Professionals
Hi-index | 0.00 |
The vast amount of textual information available today is useless unless it can be effectively and efficiently searched. The goal in information retrieval is to find documents that are relevant to a given user query. We can represent and document collection by a matrix whose (i, j) entry is nonzero only if the ith term appears in the jth document; thus each document corresponds to a columm vector. The query is also represented as a column vector whose ith term is nonzero only if the ith term appears in the query. We score each document for relevancy by taking its inner product with the query. The highest-scoring documents are considered the most relevant. Unfortunately, this method does not necessarily retrieve all relevant documents because it is based on literal term matching. Latent semantic indexing (LSI) replaces the document matrix with an approximation generated by the truncated singular-value decomposition (SVD). This method has been shown to overcome many difficulties associated with literal term matching. In this article we propose replacing the SVD with the semidiscrete decomposition (SDD). We will describe the SDD approximation, show how to compute it, and compare the SDD-based LSI method to the SVD-based LSI methods. We will show that SDD-based LSI does as well as SVD-based LSI in terms of document retrieval while requiring only one-twentieth the storage and one-half the time to compute each query. We will also show how to update the SDD approximation when documents are added or deleted from the document collection.