Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

Authors:
A. Kabán;M. A. Girolami
Affiliations:
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi;Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi
Venue:
Neural Processing Letters
Year:
2002

Citing 5
Cited 1

A fast fixed-point algorithm for independent component analysis

Neural Computation
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Using machine learning to improve information access

Using machine learning to improve information access
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Mining networked media collections

AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.