Document classification with supervised latent feature selection

  • Authors:
  • Ondrej Hava;Miroslav Skrbek;Pavel Kordík

  • Affiliations:
  • Czech Technical University in Prague;Czech Technical University in Prague;Czech Technical University in Prague

  • Venue:
  • Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The classification of text documents to categories generally deals with large dimensionality of a structured representation of the documents. To favor generality over accuracy of the classifier some dimensionality reduction technique has to be applied. In the text we present classification algorithm that utilize hidden structures of uncorrelated topics extracted from training documents and their known categories not necessarily independent. The classifier is capable to include various methods of hidden feature selection. Three latent feature selection procedures are proposed and tested.