Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

  • Authors:
  • A. Kabán;M. A. Girolami

  • Affiliations:
  • Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi;Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi

  • Venue:
  • Neural Processing Letters
  • Year:
  • 2002
  • Mining networked media collections

    AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.