Using latent semantic analysis to improve access to textual information

Authors:
S. T. Dumais;G. W. Furnas;T. K. Landauer;S. Deerwester;R. Harshman
Affiliations:
Bell Communications Research;Bell Communications Research;Bell Communications Research;Univ. of Chicago, Chicago, IL;Univ. of Western Ontario
Venue:
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
1988

Citing 11
Cited 88

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
DOMAIN/DELPHI: retrieving documents online

CHI '86 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Intelligent information-sharing systems

Communications of the ACM
Hypertext: An Introduction and Survey

Computer
The vocabulary problem in human-system communication

Communications of the ACM
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Document Classification

Journal of the ACM (JACM)
Optimization criteria for checkpoint placement

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Computer Methods for Mathematical Computations

Computer Methods for Mathematical Computations
WEIRD: An approach to concept-based information retrieval

SIGIR '78 Proceedings of the 1st annual international ACM SIGIR conference on Information storage and retrieval

Formative design evaluation of superbook

ACM Transactions on Information Systems (TOIS)
Knowledge-based search tactics for an intelligent intermediary system

ACM Transactions on Information Systems (TOIS)
The constituent object parser: syntactic structure matching for information retrieval

ACM Transactions on Information Systems (TOIS)
Behavioral evaluation and analysis of a hypertext browser

CHI '89 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
VISAR: a system for inference and navigation of hypertext

HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
Using latent semantic indexing for information filtering

COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
Distributed representations in a text based information retrieval system: a new way of using the vector space model

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing hypertext documents in context

HYPERTEXT '91 Proceedings of the third annual ACM conference on Hypertext
Bead: explorations in information visualization

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Converting a textbook to hypertext

ACM Transactions on Information Systems (TOIS)
An interface for navigating clustered document sets returned by queries

COCS '93 Proceedings of the conference on Organizational computing systems
Dynabook revisited—portable computers past, present and future

Communications of the ACM
Learning subjective relevance to facilitate information access

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Improving human-proceedings interaction: indexing the CHI index

CHI '95 Conference Companion on Human Factors in Computing Systems
Integrating system design and organizational learning

ACM SIGOIS Bulletin
What the query told the link: the integration of hypertext and information retrieval

HYPERTEXT '97 Proceedings of the eighth ACM conference on Hypertext
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Applications of linear algebra in information retrieval and hypertext analysis

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Information retrieval algorithms: a survey

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Practical evaluation of IR within automated classification systems

Proceedings of the eighth international conference on Information and knowledge management
User interactions with everyday applications as context for just-in-time information access

Proceedings of the 5th international conference on Intelligent user interfaces
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Latent semantic linking over homogeneous repositories

DocEng '01 Proceedings of the 2001 ACM Symposium on Document engineering
Polynomial-time approximation schemes for geometric min-sum median clustering

Journal of the ACM (JACM)
An infrastructure for open latent semantic linking

Proceedings of the thirteenth ACM conference on Hypertext and hypermedia
An information retrieval model based on vector space method by supervised learning

Information Processing and Management: an International Journal
Do agents need understanding?

IEEE Expert: Intelligent Systems and Their Applications
Supporting on-line resource discovery in the context of ongoing tasks with proactive software assistants

International Journal of Human-Computer Studies - Special issue on Awareness and the WWW
Organisational Information Management and Knowledge Discovery in Email within Mailing Lists

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Inferring Demographic Attributes of Anonymus Internet Users

WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Taking a new look at the latent semantic analysis approach to information retrieval

Computational information retrieval
The influence of semantics in IR using LSI and K-means clustering techniques

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Contextual spelling correction using latent semantic analysis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
A comparison of LSA, wordNet and PMI-IR for predicting user click behavior

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Findex: search result categories help users when document ranking fails

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Why is it difficult to find comprehensive information? Implications of information scatter for search and design: Research Articles

Journal of the American Society for Information Science and Technology
Incorporating context in text analysis by interactive activation with competition artificial neural networks

Information Processing and Management: an International Journal
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Variable latent semantic indexing

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Extracting significant words from corpora for ontology extraction

Proceedings of the 3rd international conference on Knowledge capture
Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Strategy hubs: Domain portals to help find comprehensive information

Journal of the American Society for Information Science and Technology
Features for unsupervised document classification

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Informing system design through organizational learning

ICLS '96 Proceedings of the 1996 international conference on Learning sciences
Efficient unsupervised recursive word segmentation using minimum description length

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Semantically enhanced user modeling

Proceedings of the 2007 ACM symposium on Applied computing
Cross-language information retrieval using PARAFAC2

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Visual islands: intuitive browsing of visual search results

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Towards Trust-Based Acquisition of Unverifiable Information

CIA '08 Proceedings of the 12th international workshop on Cooperative Information Agents XII
Active post-refined multimodality video semantic concept detection with tensor representation

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Multi-modality video shot clustering with tensor representation

Multimedia Tools and Applications
A social network approach to resolving group-level conflict in context-aware services

Expert Systems with Applications: An International Journal
Service Selection in Business Service Ecosystem

Service-Oriented Computing --- ICSOC 2008 Workshops
Trust-Aided Acquisition Of Unverifiable Information

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Parallel latent semantic analysis using a graphics processing unit

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
An application of latent semantic analysis to word sense discrimination for words with related and unrelated meanings

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Incorporating context in text analysis by interactive activation with competition artificial neural networks

Information Processing and Management: an International Journal
Findex: improving search result use through automatic filtering categories

Interacting with Computers
Measuring Semantic Closeness of Ontologically Demarcated Resources

Fundamenta Informaticae - Advances in Artificial Intelligence and Applications
A novel web text mining method based on semantic polarity analysis

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Scatter matters: Regularities and implications for the scatter of healthcare information on the Web

Journal of the American Society for Information Science and Technology
Spectral methods for matrices and tensors

Proceedings of the forty-second ACM symposium on Theory of computing
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Decomposing background topics from keywords by principal component pursuit

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Temporal Link Prediction Using Matrix and Tensor Factorizations

ACM Transactions on Knowledge Discovery from Data (TKDD)
Text clustering based on LSA-HGSOM

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Pattern recognition in multivariate time series: dissertation proposal

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
A similarity reinforcement algorithm for heterogeneous web pages

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Video semantic concept detection using multi-modality subspace correlation propagation

MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Concept chain based text clustering

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Natural language query vs. keyword search: effects of task complexity on search performance, participant perceptions, and preferences

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
A similarity-aware multiagent-based web content management scheme

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
A conscientious rival penalized competitive learning text clustering algorithm

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Web image clustering with reduced keywords and weighted bipartite spectral graph partitioning

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
WordNet-Based word sense disambiguation for learning user profiles

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Exploiting lexical knowledge in learning user profiles for intelligent information access to digital collections

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Using predicate-argument structures for context-dependent opinion retrieval

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Link prediction on evolving data using tensor factorization

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Semantic-based opinion retrieval using predicate-argument structures and subjective adjectives

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A conceptual representation of documents and queries for information retrieval systems by using light ontologies

Expert Systems with Applications: An International Journal
A domain independent framework to extract and aggregate analogous features in online reviews

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Guides for hypertext: an overview

Artificial Intelligence in Medicine
Memory-restricted latent semantic analysis to accumulate term-document co-occurrence events

Pattern Recognition Letters
Measuring Semantic Closeness of Ontologically Demarcated Resources

Fundamenta Informaticae - Advances in Artificial Intelligence and Applications
Can predicate-argument structures be used for contextual opinion retrieval from blogs?

World Wide Web
Partial-update dimensionality reduction for accumulating co-occurrence events

Pattern Recognition Letters

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.