Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Hi-index | 0.00 |
The classification of text documents to categories generally deals with large dimensionality of a structured representation of the documents. To favor generality over accuracy of the classifier some dimensionality reduction technique has to be applied. In the text we present classification algorithm that utilize hidden structures of uncorrelated topics extracted from training documents and their known categories not necessarily independent. The classifier is capable to include various methods of hidden feature selection. Three latent feature selection procedures are proposed and tested.