A Two-Stage Linear Discriminant Analysis via QR-Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning semantic relatedness from term discrimination information
Expert Systems with Applications: An International Journal
A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Non-classical lexical semantic relations
CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Text relatedness based on a word thesaurus
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Dimensionality reduction (DR) through feature extraction (FE) is desirable for efficient and effective processing of text documents. Many of the techniques for text FE produce features that are not readily interpretable and require super-linear computation time. In this paper, we present a fast supervised DR/FE technique, named FEDIP, that is motivated by the notion of relatedness of terms to topics or contexts. This relatedness is quantified by using the discrimination information provided by a term for a topic in a labeled document collection. Features are constructed by pooling the discrimination information of highly related terms for each topic. FEDIP's time complexity is linear in the size of the vocabulary and document collection. FEDIP is evaluated for document classification with SVM and naive Bayes classifiers on six text data sets. The results show that FEDIP produces low-dimension feature spaces that yield higher classification accuracy when compared with LDA and LSI. FEDIP is also found to be significantly faster than the other techniques on our evaluation data sets.