Multi-label informed latent semantic indexing

  • Authors:
  • Kai Yu;Shipeng Yu;Volker Tresp

  • Affiliations:
  • Siemens AG, Munich, Germany;University of Munich, Germany;Siemens AG, Munich, Germany

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered "latent semantics" thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters-21578 and RCV1, demonstrates very encouraging results.