A Comparison of Dimensionality Reduction Techniques for Text Retrieval

  • Authors:
  • Vishwa Vinay;Ingemar J. Cox;Ken Wood;Natasa Milic-Frayling

  • Affiliations:
  • University College London, UK;University College London, UK;Microsoft Research Ltd., UK;Microsoft Research Ltd., UK

  • Venue:
  • ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like Latent Semantic Indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like Random Mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by Principal Component Analysis (PCA) and Independent Component Analysis (ICA) but also by a simple noise reduction algorithm.