Document classification with supervised latent feature selection

Authors:
Ondrej Hava;Miroslav Skrbek;Pavel Kordík
Affiliations:
Czech Technical University in Prague;Czech Technical University in Prague;Czech Technical University in Prague
Venue:
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Year:
2012

Citing 2
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The classification of text documents to categories generally deals with large dimensionality of a structured representation of the documents. To favor generality over accuracy of the classifier some dimensionality reduction technique has to be applied. In the text we present classification algorithm that utilize hidden structures of uncorrelated topics extracted from training documents and their known categories not necessarily independent. The classifier is capable to include various methods of hidden feature selection. Three latent feature selection procedures are proposed and tested.