A Comparison of Dimensionality Reduction Techniques for Text Retrieval

Authors:
Vishwa Vinay;Ingemar J. Cox;Ken Wood;Natasa Milic-Frayling
Affiliations:
University College London, UK;University College London, UK;Microsoft Research Ltd., UK;Microsoft Research Ltd., UK
Venue:
ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
Year:
2005

Citing 0
Cited 2

Context Dependent Movie Recommendations Using a Hierarchical Bayesian Model

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like Latent Semantic Indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like Random Mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by Principal Component Analysis (PCA) and Independent Component Analysis (ICA) but also by a simple noise reduction algorithm.