ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Scoring and Selecting Terms for Text Categorization
IEEE Intelligent Systems
Searching for topics in a large collection of texts
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text classification: a recent overview
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Using Intuitionistic Fuzzy Sets in Text Categorization
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
An efficient discriminant-based solution for small sample size problem
Pattern Recognition
Clustering Documents Using a Wikipedia-Based Concept Representation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Hierarchical classification of web documents by stratified discriminant analysis
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Hi-index | 0.00 |
Document representation using the bag-of-words approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the document-term matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters-21578 database.