Sprinkling: supervised latent semantic indexing

  • Authors:
  • Sutanu Chakraborti;Robert Lothian;Nirmalie Wiratunga;Stuart Watt

  • Affiliations:
  • School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK

  • Venue:
  • ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent Semantic Indexing (LSI) is an established dimensionality reduction technique for Information Retrieval applications. However, LSI generated dimensions are not optimal in a classification setting, since LSI fails to exploit class labels of training documents. We propose an approach that uses class information to influence LSI dimensions whereby class labels of training documents are endoded as new terms, which are appended to the documents. When LSI is carried out on the augmented term-document matrix, terms pertaining to the same class are pulled closer to each other. Evaluation over experimental data reveals significant improvement in classification accuracy over LSI. The results also compare favourably with naive Support Vector Machines.